[1/3] x86: Add task_struct flag to force SIGBUS on MCE

Message ID	20240710125445.564245-1-andrew.zaborowski@intel.com (mailing list archive)
State	New
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3189018FDD4 for <linux-edac@vger.kernel.org>; Wed, 10 Jul 2024 12:54:55 +0000 (UTC) From: Andrew Zaborowski <andrew.zaborowski@intel.com> To: linux-edac@vger.kernel.org, linux-mm@kvack.org Cc: Kees Cook <keescook@chromium.org>, Tony Luck <tony.luck@intel.com>, Eric Biederman <ebiederm@xmission.com>, Borislav Petkov <bp@alien8.de> Subject: [PATCH 1/3] x86: Add task_struct flag to force SIGBUS on MCE Date: Wed, 10 Jul 2024 05:54:43 -0700 Message-ID: <20240710125445.564245-1-andrew.zaborowski@intel.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit
Series	[1/3] x86: Add task_struct flag to force SIGBUS on MCE \| expand [1/3] x86: Add task_struct flag to force SIGBUS on MCE [2/3] execve: Ensure SIGBUS delivered on memory failure [3/3] rseq: Ensure SIGBUS delivered on memory failure

Message ID

20240710125445.564245-1-andrew.zaborowski@intel.com (mailing list archive)

State

New

Headers

From: Andrew Zaborowski <andrew.zaborowski@intel.com>
To: linux-edac@vger.kernel.org,
	linux-mm@kvack.org
Cc: Kees Cook <keescook@chromium.org>,
	Tony Luck <tony.luck@intel.com>,
	Eric Biederman <ebiederm@xmission.com>,
	Borislav Petkov <bp@alien8.de>
Subject: [PATCH 1/3] x86: Add task_struct flag to force SIGBUS on MCE
Date: Wed, 10 Jul 2024 05:54:43 -0700
Message-ID: <20240710125445.564245-1-andrew.zaborowski@intel.com>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Series

[1/3] x86: Add task_struct flag to force SIGBUS on MCE | expand

Commit Message

Andrew Zaborowski July 10, 2024, 12:54 p.m. UTC

Uncorrected memory errors for user pages are signaled to processes
using SIGBUS or, if the error happens in a syscall, an error retval
from the syscall.  The SIGBUS is documented in
Documentation/mm/hwpoison.rst#failure-recovery-modes

But there are corner cases where we cannot or don't want to return a
plain error from the syscall.  Subsequent commits covers two such cases:
execve and rseq.  Current code, in both places, will kill the task with a
SIGSEGV on error.  While not explicitly stated, it can be argued that it
should be a SIGBUS, for consistency and for the benefit of the userspace
signal handlers.  Even if the process cannot handle the signal, perhaps
the parent process can.  This was the case in the scenario that
motivated this patch.

In both cases, the architecture's exception handler (MCE handler on x86)
will queue a call to memory_failure.  This doesn't work because the
syscall-specific code sees the -EFAULT and terminates the task before
the queued work runs.

To fix this: 1. let pending work run in the error cases in both places.

And 2. on MCE, ensure memory_failure() is passed MF_ACTION_REQUIRED so
that the SIGBUS is queued.  Normally when the MCE is in a syscall,
a fixup of return IP and a call to kill_me_never() are what we want.
But in this case it's necessary to queue kill_me_maybe() which will set
MF_ACTION_REQUIRED which is checked by memory_failure().

To do this the syscall code will set current->kill_on_efault, a new
task_struct flag.  Check that flag in
arch/x86/kernel/cpu/mce/core.c:do_machine_check()

Note: the flag is not x86 specific even if only x86 handling is being
added here.  The definition could be guarded by #ifdef
CONFIG_MEMORY_FAILURE, but it would then need set/clear utilities.

Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
---
This is a v2 of
https://lore.kernel.org/linux-mm/20240501015340.3014724-1-andrew.zaborowski@intel.com/
In the v1 the existing flag current->in_execve was being reused instead
of adding a new one.  Kees Cook commented in
https://lore.kernel.org/linux-mm/202405010915.465AF19@keescook/ that
current->in_execve is going away.  Lacking a better idea and seeing
that execve() and rseq() would benefit from using a common mechanism, I
decided to add this new flag.

Perhaps with a better name current->kill_on_efault could replace
brpm->point_of_no_return to offset the pain of having this extra flag.
---
 arch/x86/kernel/cpu/mce/core.c | 18 +++++++++++++++++-
 include/linux/sched.h          |  2 ++
 2 files changed, 19 insertions(+), 1 deletion(-)

Comments

Borislav Petkov July 10, 2024, 2:52 p.m. UTC | #1

On Wed, Jul 10, 2024 at 05:54:43AM -0700, Andrew Zaborowski wrote:
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.

Now deleted.

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index ad0623b65..13f2ace3d 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1611,7 +1611,7 @@  noinstr void do_machine_check(struct pt_regs *regs)
 			if (p)
 				SetPageHWPoison(p);
 		}
-	} else {
+	} else if (!current->kill_on_efault) {
 		/*
 		 * Handle an MCE which has happened in kernel space but from
 		 * which the kernel can recover: ex_has_fault_handler() has
@@ -1628,6 +1628,22 @@  noinstr void do_machine_check(struct pt_regs *regs)
 
 		if (m.kflags & MCE_IN_KERNEL_COPYIN)
 			queue_task_work(&m, msg, kill_me_never);
+	} else {
+		/*
+		 * Even with recovery code extra handling is required when
+		 * we're not returning to userspace after error (e.g. in
+		 * execve() beyond the point of no return) to ensure that
+		 * a SIGBUS is delivered.
+		 */
+		if (m.kflags & MCE_IN_KERNEL_RECOV) {
+			if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
+				mce_panic("Failed kernel mode recovery", &m, msg);
+		}
+
+		if (!mce_usable_address(&m))
+			queue_task_work(&m, msg, kill_me_now);
+		else
+			queue_task_work(&m, msg, kill_me_maybe);
 	}
 
 out:
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 61591ac6e..0cde1ba11 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -975,6 +975,8 @@  struct task_struct {
 	/* delay due to memory thrashing */
 	unsigned                        in_thrashing:1;
 #endif
+	/* Kill task on user memory access error */
+	unsigned                        kill_on_efault:1;
 
 	unsigned long			atomic_flags; /* Flags requiring atomic access. */

[1/3] x86: Add task_struct flag to force SIGBUS on MCE

Commit Message

Comments

Patch