diff mbox

[1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

Message ID 203bb8a52efae1781281fb70ccd45c3e164fbce2.1442523997.git.luto@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Andy Lutomirski Sept. 17, 2015, 9:11 p.m. UTC
This demotes an OOPS and likely panic due to a failed non-"safe" MSR
access to a WARN_ON_ONCE and a return of poisoned values (in the
RDMSR case).  We still write a pr_info entry unconditionally for
debugging.

To be clear, this type of failure should *not* happen.  This patch
exists to minimize the chance of nasty undebuggable failures due on
systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/traps.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

Comments

Ingo Molnar Sept. 18, 2015, 7:14 a.m. UTC | #1
* Andy Lutomirski <luto@kernel.org> wrote:

> This demotes an OOPS and likely panic due to a failed non-"safe" MSR
> access to a WARN_ON_ONCE and a return of poisoned values (in the
> RDMSR case).  We still write a pr_info entry unconditionally for
> debugging.
> 
> To be clear, this type of failure should *not* happen.  This patch
> exists to minimize the chance of nasty undebuggable failures due on
> systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.

> +	if (opcode == 0x320f) {
> +		/* RDMSR */
> +		pr_info("bad kernel RDMSR from non-existent MSR 0x%x",
> +			(unsigned int)regs->cx);
> +		if (!panic_on_oops) {
> +			WARN_ON_ONCE(true);
> +
> +			/* Patch it up with deterministic poison. */
> +			regs->ax = 0x5aadc0de;
> +			regs->dx = 0x8badf00d;
> +			regs->ip += 2;
> +			return true;

IMHO this should really not poison the result, but use zero as the result.

The poison might randomly indicate 'present' feature in various registers that 
might be accessed in a buggy way. Don't send the code further down into la-la-land 
by giving it a 'success'.

And yes, zero can mean success too, but we have to pick a side here ...

The warning will be enough to fix these ups, people (and in particular distro 
testing people) will be watching out for them.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 346eec73f7db..b7731765017a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -437,6 +437,54 @@  exit_trap:
 	do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
 }
 
+static bool paper_over_kernel_gpf(struct pt_regs *regs)
+{
+	/*
+	 * Try to decode the opcode that failed.  So far, we only care
+	 * about boring two-byte unprefixed opcodes, so we don't need
+	 * the full instruction decoder machinery.
+	 */
+	u16 opcode;
+
+	if (probe_kernel_read(&opcode, (const void *)regs->ip, sizeof(opcode)))
+		return false;
+
+	if (opcode == 0x320f) {
+		/* RDMSR */
+		pr_info("bad kernel RDMSR from non-existent MSR 0x%x",
+			(unsigned int)regs->cx);
+		if (!panic_on_oops) {
+			WARN_ON_ONCE(true);
+
+			/* Patch it up with deterministic poison. */
+			regs->ax = 0x5aadc0de;
+			regs->dx = 0x8badf00d;
+			regs->ip += 2;
+			return true;
+		} else {
+			/* Don't fix it up. */
+			return false;
+		}
+	} else if (opcode == 0x300f) {
+		/* WRMSR */
+		pr_info("bad kernel WRMSR writing 0x%08x%08x to MSR 0x%x",
+			(unsigned int)regs->dx, (unsigned int)regs->ax,
+			(unsigned int)regs->cx);
+		if (!panic_on_oops) {
+			WARN_ON_ONCE(true);
+
+			/* Pretend it worked and carry on. */
+			regs->ip += 2;
+			return true;
+		} else {
+			/* Don't fix it up. */
+			return false;
+		}
+	}
+
+	return false;
+}
+
 dotraplinkage void
 do_general_protection(struct pt_regs *regs, long error_code)
 {
@@ -456,6 +504,9 @@  do_general_protection(struct pt_regs *regs, long error_code)
 		if (fixup_exception(regs))
 			return;
 
+		if (paper_over_kernel_gpf(regs))
+			return;
+
 		tsk->thread.error_code = error_code;
 		tsk->thread.trap_nr = X86_TRAP_GP;
 		if (notify_die(DIE_GPF, "general protection fault", regs, error_code,