Message ID | 130a3b7ef4788baae3a6fe71293ab17442bc9a0a.1442793572.git.luto@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, Sep 20, 2015 at 5:02 PM, Andy Lutomirski <luto@kernel.org> wrote: > This demotes an OOPS and likely panic due to a failed non-"safe" MSR > access to a WARN_ON_ONCE and a return of zero (in the RDMSR case). > We still write a pr_info entry unconditionally for debugging. No, this is wrong. If you really want to do something like this, then just make all MSR reads safe. So the only difference between "safe" and "unsafe" is that the unsafe version just doesn't check the return value, and silently just returns zero for reads (or writes nothing). To quote Obi-Wan: "Use the exception table, Luke". Because decoding instructions is just too ugly. We'll do it for CPU errata where we might have to do it for user space code too (ie the AMD prefetch mess), but for code that _we_ control? Hell no. So NAK on this. Linus -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sep 20, 2015 5:15 PM, "Linus Torvalds" <torvalds@linux-foundation.org> wrote: > > On Sun, Sep 20, 2015 at 5:02 PM, Andy Lutomirski <luto@kernel.org> wrote: > > This demotes an OOPS and likely panic due to a failed non-"safe" MSR > > access to a WARN_ON_ONCE and a return of zero (in the RDMSR case). > > We still write a pr_info entry unconditionally for debugging. > > No, this is wrong. > > If you really want to do something like this, then just make all MSR > reads safe. So the only difference between "safe" and "unsafe" is that > the unsafe version just doesn't check the return value, and silently > just returns zero for reads (or writes nothing). > > To quote Obi-Wan: "Use the exception table, Luke". > > Because decoding instructions is just too ugly. We'll do it for CPU > errata where we might have to do it for user space code too (ie the > AMD prefetch mess), but for code that _we_ control? Hell no. > > So NAK on this. My personal preference is to just not do this at all. A couple people disagree. If we make the unsafe variants not oops, then I think we want to have the nice loud warning, since these issues are bugs if they happen. We could certainly use the exception table for this, but it'll result in bigger core, since each MSR access will need an exception table entry and an associated fixup to call some helper that warns and sets the result to zero. I'd be happy to implement that, but only if it'll be applied. Otherwise I'd rather just drop this patch and keep the rest of the series. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Andy Lutomirski <luto@amacapital.net> wrote: > On Sep 20, 2015 5:15 PM, "Linus Torvalds" <torvalds@linux-foundation.org> wrote: > > > > On Sun, Sep 20, 2015 at 5:02 PM, Andy Lutomirski <luto@kernel.org> wrote: > > > This demotes an OOPS and likely panic due to a failed non-"safe" MSR > > > access to a WARN_ON_ONCE and a return of zero (in the RDMSR case). > > > We still write a pr_info entry unconditionally for debugging. > > > > No, this is wrong. > > > > If you really want to do something like this, then just make all MSR reads > > safe. So the only difference between "safe" and "unsafe" is that the unsafe > > version just doesn't check the return value, and silently just returns zero > > for reads (or writes nothing). > > > > To quote Obi-Wan: "Use the exception table, Luke". > > > > Because decoding instructions is just too ugly. We'll do it for CPU errata > > where we might have to do it for user space code too (ie the AMD prefetch > > mess), but for code that _we_ control? Hell no. > > > > So NAK on this. > > My personal preference is to just not do this at all. A couple people disagree. > If we make the unsafe variants not oops, then I think we want to have the nice > loud warning, since these issues are bugs if they happen. > > We could certainly use the exception table for this, but it'll result in bigger > core, since each MSR access will need an exception table entry and an associated > fixup to call some helper that warns and sets the result to zero. > > I'd be happy to implement that, but only if it'll be applied. Otherwise I'd > rather just drop this patch and keep the rest of the series. Linus, what's your preference? Due to the bug mentioned earlier in this thread all MSR reads are currently 'safe' on all the major Linux distros (which all have CONFIG_PARAVIRT=y), i.e. by 'fixing' them we'd reintroduce random crashes into various fragile pieces of code... To add insult to injury, the current 'silently safe by accident' MSR code isn't so safe: because it leaves the result of the read uninitialized... To fix this all I'd really like to have: - safe MSR reads by default (i.e. never boot crash the kernel on some rare condition - which to most users is either a silent boot hang or an instant restart). Historicaly we had a stream of 'silly boot crashes' due to MSR reads that generate a #GPF. They make Linux less usable around the edges, especially in the x86 non-server (desktop) space where most hardware vendors are either openly Linux hostile, or, at best, Linux oblivious. - proper result-zeroing behavior on exceptions - and we should also generate _some_ sort of warning when MSR exceptions happen in an 'unintended' fashion. Maybe the warning could be put under a (default-enabled) config option for the size conscious. Or we could extend exception table entry encoding to include a 'warning bit', to not bloat the kernel. If the exception handler code encounters such an exception it would generate a one-time warning for that entry, but otherwise not crash the kernel and continue execution with an all-zeroes result for the MSR read. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 21/09/2015 10:46, Ingo Molnar wrote: > Or we could extend exception table entry encoding to include a 'warning bit', to > not bloat the kernel. If the exception handler code encounters such an exception > it would generate a one-time warning for that entry, but otherwise not crash the > kernel and continue execution with an all-zeroes result for the MSR read. The 'warning bit' already exists, it is the opcode that caused the fault. :) The concern about bloat is a good one. However, why is it necessary to keep native_*_msr* inline? If they are moved out-of-line, using the exception table becomes the obvious solution and doesn't cause bloat anymore. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote: > > Linus, what's your preference? So quite frankly, is there any reason we don't just implement native_read_msr() as just unsigned long long native_read_msr(unsigned int msr) { int err; unsigned long long val; val = native_read_msr_safe(msr, &err); WARN_ON_ONCE(err); return val; } Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be done with it. I don't see the downside. How many msr reads are <i>so</i> critical that the function call overhead would matter? Get rid of the inline version of the _safe() thing too, and put that thing there too. Linus -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 9/21/2015 9:36 AM, Linus Torvalds wrote: > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote: >> >> Linus, what's your preference? > > So quite frankly, is there any reason we don't just implement > native_read_msr() as just > > unsigned long long native_read_msr(unsigned int msr) > { > int err; > unsigned long long val; > > val = native_read_msr_safe(msr, &err); > WARN_ON_ONCE(err); > return val; > } > > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be > done with it. I don't see the downside. > > How many msr reads are <i>so</i> critical that the function call > overhead would matter? if anything qualifies it'd be switch_to() and friends. note that I'm not entirely happy about the notion of "safe" MSRs. They're safe as in "won't fault". Reading random MSRs isn't a generic safe operation though, but the name sort of gives people the impression that it is. Even with _safe variants, you still need to KNOW the MSR exists (by means of CPUID or similar) unfortunately. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 21, 2015 at 9:49 AM, Arjan van de Ven <arjan@linux.intel.com> wrote: >> >> How many msr reads are <i>so</i> critical that the function call >> overhead would matter? > > if anything qualifies it'd be switch_to() and friends. Is there anything else than the FS/GS_BASE thing (possibly hidden behind inlines etc that I didn't get from a quick grep)? And why is that sometimes using the "safe" version (in do_arch_prctl()), and sometimes not (switch_to())? I'm not convinced that mess is a good argument for the status quo ;) > note that I'm not entirely happy about the notion of "safe" MSRs. > They're safe as in "won't fault". I wouldn't object to renaming them. Linus -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 21, 2015 at 9:49 AM, Arjan van de Ven <arjan@linux.intel.com> wrote: > On 9/21/2015 9:36 AM, Linus Torvalds wrote: >> >> On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote: >>> >>> >>> Linus, what's your preference? >> >> >> So quite frankly, is there any reason we don't just implement >> native_read_msr() as just >> >> unsigned long long native_read_msr(unsigned int msr) >> { >> int err; >> unsigned long long val; >> >> val = native_read_msr_safe(msr, &err); >> WARN_ON_ONCE(err); >> return val; >> } >> >> Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be >> done with it. I don't see the downside. >> >> How many msr reads are <i>so</i> critical that the function call >> overhead would matter? > > > if anything qualifies it'd be switch_to() and friends. And maybe the KVM user return notifier. Unfortunately, switch_to might gain another two MSR accesses at some point if we decide to fix the bugs in there. Sigh. > > note that I'm not entirely happy about the notion of "safe" MSRs. > They're safe as in "won't fault". > Reading random MSRs isn't a generic safe operation though, but the name sort > of gives people > the impression that it is. Even with _safe variants, you still need to KNOW > the MSR exists (by means > of CPUID or similar) unfortunately. > I tend to agree. Anyway, the fully out-of-line approach isn't obviously a bad idea, and it simplifies the whole mess (we can drop most of the paravirt patches, too). I'll give it a try and see what happens. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 21, 2015 at 9:36 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote: >> >> Linus, what's your preference? > > So quite frankly, is there any reason we don't just implement > native_read_msr() as just > > unsigned long long native_read_msr(unsigned int msr) > { > int err; > unsigned long long val; > > val = native_read_msr_safe(msr, &err); > WARN_ON_ONCE(err); > return val; > } > > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be > done with it. I don't see the downside. In the interest of sanity, I want to drop the "native_", too, since there appear to be few or no good use cases for native_read_msr as such. I'm tempted to add new functions read_msr and write_msr that forward to rdmsrl_safe and wrmsrl_safe. It looks like the msr helpers are every bit as bad as the TSC helpers used to be :( --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 21, 2015 at 11:16:30AM -0700, Andy Lutomirski wrote: > In the interest of sanity, I want to drop the "native_", too, since > there appear to be few or no good use cases for native_read_msr as > such. I'm tempted to add new functions read_msr and write_msr that > forward to rdmsrl_safe and wrmsrl_safe. Just change the msr_read/msr_write() ones in arch/x86/lib/msr.c to take a u64 and you're there. > It looks like the msr helpers are every bit as bad as the TSC helpers > used to be :( Yap.
On Mon, Sep 21, 2015 at 11:16 AM, Andy Lutomirski <luto@amacapital.net> wrote: > > In the interest of sanity, I want to drop the "native_", too Yes. I think the only reason it exists is to have that wrapper layer for PV. And that argument just goes away if you just make the non-inline helper function do all the PV logic directly. I really suspect we should do this for a *lot* of the PV ops. Yeah, some are so performance-critical that we probably do have a good reason for the inline indirections etc (historical example: native spin-unlock, which traditionally could be done as a single store instruction), but I suspect a lot of the PV indirection is for this kind of "historical wrapper model" reason, and it often makes it really hard to see what is going on because you have to go through several layers of indirection, often in different files. Linus -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote: > > > > Linus, what's your preference? > > So quite frankly, is there any reason we don't just implement > native_read_msr() as just > > unsigned long long native_read_msr(unsigned int msr) > { > int err; > unsigned long long val; > > val = native_read_msr_safe(msr, &err); > WARN_ON_ONCE(err); > return val; > } > > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be > done with it. I don't see the downside. Absolutely! > How many msr reads are <i>so</i> critical that the function call overhead would > matter? Get rid of the inline version of the _safe() thing too, and put that > thing there too. Only a very low number of them is performance critical (because even hw-accelerated MSR accesses are generally slow so we try to avoid MSR accesses in fast paths as much as possible, via shadowing, etc.) - and in the few cases where we have to access an MSR in a fast path we can do those separately. I'm only worried about the 'default' APIs, i.e. rdmsr() that is used throughout arch/x86/ over a hundred times, not about performance critical code paths that get enough testing and enough attention in general. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 21/09/2015 19:43, Andy Lutomirski wrote:
> And maybe the KVM user return notifier.
No, not really. If anything, the place in KVM where it makes a
difference is vmx_save_host_state, which is also only using
always-present MSRs. But don't care about KVM.
First clean it up, then we can add back inline versions like __rdmsr or
rdmsr_fault or rdmsr_unsafe or whatever.
Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 21, 2015 at 09:36:15AM -0700, Linus Torvalds wrote: > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote: > > > > Linus, what's your preference? > > So quite frankly, is there any reason we don't just implement > native_read_msr() as just > > unsigned long long native_read_msr(unsigned int msr) > { > int err; > unsigned long long val; > > val = native_read_msr_safe(msr, &err); > WARN_ON_ONCE(err); > return val; > } > > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be > done with it. I don't see the downside. > > How many msr reads are <i>so</i> critical that the function call > overhead would matter? Get rid of the inline version of the _safe() > thing too, and put that thing there too. There are a few in the perf code, and esp. on cores without a stack engine the call overhead is noticeable. Also note that the perf MSRs are generally optimized MSRs and less slow (we cannot say fast, they're still MSRs) than regular MSRs. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Sep 21, 2015 at 09:36:15AM -0700, Linus Torvalds wrote: > > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote: > > > > > > Linus, what's your preference? > > > > So quite frankly, is there any reason we don't just implement > > native_read_msr() as just > > > > unsigned long long native_read_msr(unsigned int msr) > > { > > int err; > > unsigned long long val; > > > > val = native_read_msr_safe(msr, &err); > > WARN_ON_ONCE(err); > > return val; > > } > > > > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be > > done with it. I don't see the downside. > > > > How many msr reads are <i>so</i> critical that the function call > > overhead would matter? Get rid of the inline version of the _safe() > > thing too, and put that thing there too. > > There are a few in the perf code, and esp. on cores without a stack engine the > call overhead is noticeable. Also note that the perf MSRs are generally > optimized MSRs and less slow (we cannot say fast, they're still MSRs) than > regular MSRs. These could still be open coded in an inlined fashion, like the scheduler usage. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 30, 2015 at 7:01 AM, Ingo Molnar <mingo@kernel.org> wrote: > > * Peter Zijlstra <peterz@infradead.org> wrote: > >> On Mon, Sep 21, 2015 at 09:36:15AM -0700, Linus Torvalds wrote: >> > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mingo@kernel.org> wrote: >> > > >> > > Linus, what's your preference? >> > >> > So quite frankly, is there any reason we don't just implement >> > native_read_msr() as just >> > >> > unsigned long long native_read_msr(unsigned int msr) >> > { >> > int err; >> > unsigned long long val; >> > >> > val = native_read_msr_safe(msr, &err); >> > WARN_ON_ONCE(err); >> > return val; >> > } >> > >> > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be >> > done with it. I don't see the downside. >> > >> > How many msr reads are <i>so</i> critical that the function call >> > overhead would matter? Get rid of the inline version of the _safe() >> > thing too, and put that thing there too. >> >> There are a few in the perf code, and esp. on cores without a stack engine the >> call overhead is noticeable. Also note that the perf MSRs are generally >> optimized MSRs and less slow (we cannot say fast, they're still MSRs) than >> regular MSRs. > > These could still be open coded in an inlined fashion, like the scheduler usage. > We could have a raw_rdmsr for those. OTOH, I'm still not 100% convinced that this warn-but-don't-die behavior is worth the effort. This isn't a frequent source of bugs to my knowledge, and we don't try to recover from incorrect cr writes, out-of-bounds MMIO, etc, so do we really gain much by rigging a recovery mechanism for rdmsr and wrmsr failures for code that doesn't use the _safe variants? --Andy > Thanks, > > Ingo
On 09/21/2015 09:36 AM, Linus Torvalds wrote: > > How many msr reads are <i>so</i> critical that the function call > overhead would matter? Get rid of the inline version of the _safe() > thing too, and put that thing there too. > Probably only the ones that may go in the context switch path. -hpa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Andy Lutomirski <luto@amacapital.net> wrote: > > These could still be open coded in an inlined fashion, like the scheduler usage. > > We could have a raw_rdmsr for those. > > OTOH, I'm still not 100% convinced that this warn-but-don't-die behavior is > worth the effort. This isn't a frequent source of bugs to my knowledge, and we > don't try to recover from incorrect cr writes, out-of-bounds MMIO, etc, so do we > really gain much by rigging a recovery mechanism for rdmsr and wrmsr failures > for code that doesn't use the _safe variants? It's just the general principle really: don't crash the kernel on bootup. There's few things more user hostile than that. Also, this would maintain the status quo: since we now (accidentally) don't crash the kernel on distro kernels (but silently and unsafely ignore the faulting instruction), we should not regress that behavior (by adding the chance to crash again), but improve upon it. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 1, 2015 at 12:15 AM, Ingo Molnar <mingo@kernel.org> wrote: > > * Andy Lutomirski <luto@amacapital.net> wrote: > >> > These could still be open coded in an inlined fashion, like the scheduler usage. >> >> We could have a raw_rdmsr for those. >> >> OTOH, I'm still not 100% convinced that this warn-but-don't-die behavior is >> worth the effort. This isn't a frequent source of bugs to my knowledge, and we >> don't try to recover from incorrect cr writes, out-of-bounds MMIO, etc, so do we >> really gain much by rigging a recovery mechanism for rdmsr and wrmsr failures >> for code that doesn't use the _safe variants? > > It's just the general principle really: don't crash the kernel on bootup. There's > few things more user hostile than that. > > Also, this would maintain the status quo: since we now (accidentally) don't crash > the kernel on distro kernels (but silently and unsafely ignore the faulting > instruction), we should not regress that behavior (by adding the chance to crash > again), but improve upon it. Just a heads up: the extable improvements in tip:ras/core make it straightforward to get the best of all worlds: explicit failure handling (written in C!), no fast path overhead whatsoever, and no new garbage in the exception handlers. Patches coming once I test them. > > Thanks, > > Ingo
* Andy Lutomirski <luto@amacapital.net> wrote: > On Thu, Oct 1, 2015 at 12:15 AM, Ingo Molnar <mingo@kernel.org> wrote: > > > > * Andy Lutomirski <luto@amacapital.net> wrote: > > > >> > These could still be open coded in an inlined fashion, like the scheduler usage. > >> > >> We could have a raw_rdmsr for those. > >> > >> OTOH, I'm still not 100% convinced that this warn-but-don't-die behavior is > >> worth the effort. This isn't a frequent source of bugs to my knowledge, and we > >> don't try to recover from incorrect cr writes, out-of-bounds MMIO, etc, so do we > >> really gain much by rigging a recovery mechanism for rdmsr and wrmsr failures > >> for code that doesn't use the _safe variants? > > > > It's just the general principle really: don't crash the kernel on bootup. There's > > few things more user hostile than that. > > > > Also, this would maintain the status quo: since we now (accidentally) don't crash > > the kernel on distro kernels (but silently and unsafely ignore the faulting > > instruction), we should not regress that behavior (by adding the chance to crash > > again), but improve upon it. > > Just a heads up: the extable improvements in tip:ras/core make it > straightforward to get the best of all worlds: explicit failure > handling (written in C!), no fast path overhead whatsoever, and no new > garbage in the exception handlers. I _knew_ I should have merged them into tip:x86/mm, not tip:ras/core ;-) I had a quick look at your new MSR series and I'm very happy with that direction! Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 346eec73f7db..f82987643e32 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -437,6 +437,58 @@ exit_trap: do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL); } +static bool paper_over_kernel_gpf(struct pt_regs *regs) +{ + /* + * Try to decode the opcode that failed. So far, we only care + * about boring two-byte unprefixed opcodes, so we don't need + * the full instruction decoder machinery. + */ + u16 opcode; + + if (probe_kernel_read(&opcode, (const void *)regs->ip, sizeof(opcode))) + return false; + + if (opcode == 0x320f) { + /* RDMSR */ + pr_info("bad kernel RDMSR from non-existent MSR 0x%x", + (unsigned int)regs->cx); + if (!panic_on_oops) { + WARN_ON_ONCE(true); + + /* + * Pretend that RDMSR worked and returned zero. We + * chose zero because zero seems less likely to + * cause further malfunctions than any other value. + */ + regs->ax = 0; + regs->dx = 0; + regs->ip += 2; + return true; + } else { + /* Don't fix it up. */ + return false; + } + } else if (opcode == 0x300f) { + /* WRMSR */ + pr_info("bad kernel WRMSR writing 0x%08x%08x to MSR 0x%x", + (unsigned int)regs->dx, (unsigned int)regs->ax, + (unsigned int)regs->cx); + if (!panic_on_oops) { + WARN_ON_ONCE(true); + + /* Pretend it worked and carry on. */ + regs->ip += 2; + return true; + } else { + /* Don't fix it up. */ + return false; + } + } + + return false; +} + dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code) { @@ -456,6 +508,9 @@ do_general_protection(struct pt_regs *regs, long error_code) if (fixup_exception(regs)) return; + if (paper_over_kernel_gpf(regs)) + return; + tsk->thread.error_code = error_code; tsk->thread.trap_nr = X86_TRAP_GP; if (notify_die(DIE_GPF, "general protection fault", regs, error_code,
This demotes an OOPS and likely panic due to a failed non-"safe" MSR access to a WARN_ON_ONCE and a return of zero (in the RDMSR case). We still write a pr_info entry unconditionally for debugging. To be clear, this type of failure should *not* happen. This patch exists to minimize the chance of nasty undebuggable failures due on systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug. Signed-off-by: Andy Lutomirski <luto@kernel.org> --- arch/x86/kernel/traps.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+)