Message ID | 20210108222251.14391-3-tony.luck@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix infinite machine check loop in futex_wait_setup() | expand |
On Fri, Jan 08, 2021 at 02:22:51PM -0800, Tony Luck wrote: > futex_wait_setup() first tries to read the user value with page faults > disabled (because it holds a lock, and so cannot sleep). If that read > fails it drops the lock and tries again. > > But there are now two reasons why the user space read can fail. Either: > 1) legacy case of a page fault, in which case it is reasonable to retry > 2) machine check on the user address, bad idea to re-read > > Add some infrastructure to differentiate these cases. > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -2658,6 +2658,9 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags, > if (ret) { > queue_unlock(*hb); > > + if (arch_memory_failure(uaddr)) > + return ret; > + > ret = get_user(uval, uaddr); > if (ret) > return ret; I think this is horrid; why can't we have it return something different then -EFAULT instead?
> I think this is horrid; why can't we have it return something different > then -EFAULT instead? I did consider this ... but it appears that architectures aren't unified in the return value from get_user() Here's another function involved in the futex call chain leading to this: static int get_futex_value_locked(u32 *dest, u32 __user *from) { int ret; pagefault_disable(); ret = __get_user(*dest, from); pagefault_enable(); return ret ? -EFAULT : 0; } It seems like the expectation here is just "zero or not" and we don't care what the "not" value is ... just turn it into -EFAULT. -Tony
On Fri, Jan 08, 2021 at 11:08:58PM +0000, Luck, Tony wrote: > > I think this is horrid; why can't we have it return something different > > then -EFAULT instead? > > I did consider this ... but it appears that architectures aren't unified in the > return value from get_user() But surely none are currently returning -EMEMERR or whatever name we come up with. > Here's another function involved in the futex call chain leading to this: > > static int get_futex_value_locked(u32 *dest, u32 __user *from) > { > int ret; > > pagefault_disable(); > ret = __get_user(*dest, from); > pagefault_enable(); > > return ret ? -EFAULT : 0; > } > > It seems like the expectation here is just "zero or not" and we > don't care what the "not" value is ... just turn it into -EFAULT. Yeah, saw that, but that should be trivially fixable I'm thinking.
> Yeah, saw that, but that should be trivially fixable I'm thinking.
Trivial, maybe ... but then follows the audit of other get_user() calls:
git grep get_user\( | wc -l
2003
:-(
-Tony
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h index 5d7494631ea9..a46c78381388 100644 --- a/arch/x86/include/asm/mmu.h +++ b/arch/x86/include/asm/mmu.h @@ -66,4 +66,11 @@ typedef struct { void leave_mm(int cpu); #define leave_mm leave_mm +#if defined(CONFIG_X86_MCE) && defined(CONFIG_MEMORY_FAILURE) +#undef arch_memory_failure +#define arch_memory_failure x86_memory_failure +#endif + +bool x86_memory_failure(u32 __user *addr); + #endif /* _ASM_X86_MMU_H */ diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 1bf11213e093..b27aa30290bb 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1236,6 +1236,16 @@ static void __mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *fin *m = *final; } +bool x86_memory_failure(u32 __user *addr) +{ + if (current->mce_busy == 0) + return false; + + WARN_ON(current->mce_vaddr != addr); + + return true; +} + static void kill_me_now(struct callback_head *ch) { force_sig(SIGBUS); diff --git a/include/linux/mm.h b/include/linux/mm.h index ecdf8a8cd6ae..470708a71dd3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3177,5 +3177,9 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping, extern int sysctl_nr_trim_pages; +#ifndef arch_memory_failure +#define arch_memory_failure(vaddr) (0) +#endif + #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff --git a/kernel/futex.c b/kernel/futex.c index c47d1015d759..8fa2fc854026 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2658,6 +2658,9 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags, if (ret) { queue_unlock(*hb); + if (arch_memory_failure(uaddr)) + return ret; + ret = get_user(uval, uaddr); if (ret) return ret;
futex_wait_setup() first tries to read the user value with page faults disabled (because it holds a lock, and so cannot sleep). If that read fails it drops the lock and tries again. But there are now two reasons why the user space read can fail. Either: 1) legacy case of a page fault, in which case it is reasonable to retry 2) machine check on the user address, bad idea to re-read Add some infrastructure to differentiate these cases. Signed-off-by: Tony Luck <tony.luck@intel.com> --- arch/x86/include/asm/mmu.h | 7 +++++++ arch/x86/kernel/cpu/mce/core.c | 10 ++++++++++ include/linux/mm.h | 4 ++++ kernel/futex.c | 3 +++ 4 files changed, 24 insertions(+)