diff mbox series

[2/2] futex, x86/mce: Avoid double machine checks

Message ID 20210108222251.14391-3-tony.luck@intel.com (mailing list archive)
State New, archived
Headers show
Series Fix infinite machine check loop in futex_wait_setup() | expand

Commit Message

Tony Luck Jan. 8, 2021, 10:22 p.m. UTC
futex_wait_setup() first tries to read the user value with page faults
disabled (because it holds a lock, and so cannot sleep). If that read
fails it drops the lock and tries again.

But there are now two reasons why the user space read can fail. Either:
1) legacy case of a page fault, in which case it is reasonable to retry
2) machine check on the user address, bad idea to re-read

Add some infrastructure to differentiate these cases.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/mmu.h     |  7 +++++++
 arch/x86/kernel/cpu/mce/core.c | 10 ++++++++++
 include/linux/mm.h             |  4 ++++
 kernel/futex.c                 |  3 +++
 4 files changed, 24 insertions(+)

Comments

Peter Zijlstra Jan. 8, 2021, 10:47 p.m. UTC | #1
On Fri, Jan 08, 2021 at 02:22:51PM -0800, Tony Luck wrote:
> futex_wait_setup() first tries to read the user value with page faults
> disabled (because it holds a lock, and so cannot sleep). If that read
> fails it drops the lock and tries again.
> 
> But there are now two reasons why the user space read can fail. Either:
> 1) legacy case of a page fault, in which case it is reasonable to retry
> 2) machine check on the user address, bad idea to re-read
> 
> Add some infrastructure to differentiate these cases.

> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2658,6 +2658,9 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
>  	if (ret) {
>  		queue_unlock(*hb);
>  
> +		if (arch_memory_failure(uaddr))
> +			return ret;
> +
>  		ret = get_user(uval, uaddr);
>  		if (ret)
>  			return ret;


I think this is horrid; why can't we have it return something different
then -EFAULT instead?
Tony Luck Jan. 8, 2021, 11:08 p.m. UTC | #2
> I think this is horrid; why can't we have it return something different
> then -EFAULT instead?

I did consider this ... but it appears that architectures aren't unified in the
return value from get_user()

Here's another function involved in the futex call chain leading to this:

static int get_futex_value_locked(u32 *dest, u32 __user *from)
{
        int ret;

        pagefault_disable();
        ret = __get_user(*dest, from);
        pagefault_enable();

        return ret ? -EFAULT : 0;
}

It seems like the expectation here is just "zero or not" and we
don't care what the "not" value is ... just turn it into -EFAULT.

-Tony
Peter Zijlstra Jan. 8, 2021, 11:14 p.m. UTC | #3
On Fri, Jan 08, 2021 at 11:08:58PM +0000, Luck, Tony wrote:
> > I think this is horrid; why can't we have it return something different
> > then -EFAULT instead?
> 
> I did consider this ... but it appears that architectures aren't unified in the
> return value from get_user()

But surely none are currently returning -EMEMERR or whatever name we
come up with.

> Here's another function involved in the futex call chain leading to this:
> 
> static int get_futex_value_locked(u32 *dest, u32 __user *from)
> {
>         int ret;
> 
>         pagefault_disable();
>         ret = __get_user(*dest, from);
>         pagefault_enable();
> 
>         return ret ? -EFAULT : 0;
> }
> 
> It seems like the expectation here is just "zero or not" and we
> don't care what the "not" value is ... just turn it into -EFAULT.

Yeah, saw that, but that should be trivially fixable I'm thinking.
Tony Luck Jan. 8, 2021, 11:20 p.m. UTC | #4
> Yeah, saw that, but that should be trivially fixable I'm thinking.

Trivial, maybe ... but then follows the audit of other get_user() calls:

git grep get_user\( | wc -l
2003

:-(

-Tony
diff mbox series

Patch

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 5d7494631ea9..a46c78381388 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -66,4 +66,11 @@  typedef struct {
 void leave_mm(int cpu);
 #define leave_mm leave_mm
 
+#if defined(CONFIG_X86_MCE) && defined(CONFIG_MEMORY_FAILURE)
+#undef arch_memory_failure
+#define arch_memory_failure x86_memory_failure
+#endif
+
+bool x86_memory_failure(u32 __user *addr);
+
 #endif /* _ASM_X86_MMU_H */
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 1bf11213e093..b27aa30290bb 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1236,6 +1236,16 @@  static void __mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *fin
 	*m = *final;
 }
 
+bool x86_memory_failure(u32 __user *addr)
+{
+	if (current->mce_busy == 0)
+		return false;
+
+	WARN_ON(current->mce_vaddr != addr);
+
+	return true;
+}
+
 static void kill_me_now(struct callback_head *ch)
 {
 	force_sig(SIGBUS);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ecdf8a8cd6ae..470708a71dd3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3177,5 +3177,9 @@  unsigned long wp_shared_mapping_range(struct address_space *mapping,
 
 extern int sysctl_nr_trim_pages;
 
+#ifndef arch_memory_failure
+#define arch_memory_failure(vaddr)	(0)
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/kernel/futex.c b/kernel/futex.c
index c47d1015d759..8fa2fc854026 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2658,6 +2658,9 @@  static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
 	if (ret) {
 		queue_unlock(*hb);
 
+		if (arch_memory_failure(uaddr))
+			return ret;
+
 		ret = get_user(uval, uaddr);
 		if (ret)
 			return ret;