Message ID | e6c57f675e5b53d4de266412aa526b7660c47918.1554248002.git.khalid.aziz@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add support for eXclusive Page Frame Ownership | expand |
On Wed, Apr 3, 2019 at 10:36 AM Khalid Aziz <khalid.aziz@oracle.com> wrote: > > From: Tycho Andersen <tycho@tycho.ws> > > Oopsing might kill the task, via rewind_stack_do_exit() at the bottom, and > that might sleep: > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > index 9d5c75f02295..7891add0913f 100644 > --- a/arch/x86/mm/fault.c > +++ b/arch/x86/mm/fault.c > @@ -858,6 +858,12 @@ no_context(struct pt_regs *regs, unsigned long error_code, > /* Executive summary in case the body of the oops scrolled away */ > printk(KERN_DEFAULT "CR2: %016lx\n", address); > > + /* > + * We're about to oops, which might kill the task. Make sure we're > + * allowed to sleep. > + */ > + flags |= X86_EFLAGS_IF; > + > oops_end(flags, regs, sig); > } > NAK. If there's a bug in rewind_stack_do_exit(), please fix it in rewind_stack_do_exit().
On Wed, Apr 03, 2019 at 05:12:56PM -0700, Andy Lutomirski wrote: > On Wed, Apr 3, 2019 at 10:36 AM Khalid Aziz <khalid.aziz@oracle.com> wrote: > > > > From: Tycho Andersen <tycho@tycho.ws> > > > > Oopsing might kill the task, via rewind_stack_do_exit() at the bottom, and > > that might sleep: > > > > > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > > index 9d5c75f02295..7891add0913f 100644 > > --- a/arch/x86/mm/fault.c > > +++ b/arch/x86/mm/fault.c > > @@ -858,6 +858,12 @@ no_context(struct pt_regs *regs, unsigned long error_code, > > /* Executive summary in case the body of the oops scrolled away */ > > printk(KERN_DEFAULT "CR2: %016lx\n", address); > > > > + /* > > + * We're about to oops, which might kill the task. Make sure we're > > + * allowed to sleep. > > + */ > > + flags |= X86_EFLAGS_IF; > > + > > oops_end(flags, regs, sig); > > } > > > > > NAK. If there's a bug in rewind_stack_do_exit(), please fix it in > rewind_stack_do_exit(). [I trimmed the CC list since google rejected it with E2BIG :)] I guess the problem is really that do_exit() (or really exit_signals()) might sleep. Maybe we should put an irq_enable() at the beginning of do_exit() instead and fix this problem for all arches? Tycho
On Wed, Apr 3, 2019 at 6:42 PM Tycho Andersen <tycho@tycho.ws> wrote: > > On Wed, Apr 03, 2019 at 05:12:56PM -0700, Andy Lutomirski wrote: > > On Wed, Apr 3, 2019 at 10:36 AM Khalid Aziz <khalid.aziz@oracle.com> wrote: > > > > > > From: Tycho Andersen <tycho@tycho.ws> > > > > > > Oopsing might kill the task, via rewind_stack_do_exit() at the bottom, and > > > that might sleep: > > > > > > > > > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > > > index 9d5c75f02295..7891add0913f 100644 > > > --- a/arch/x86/mm/fault.c > > > +++ b/arch/x86/mm/fault.c > > > @@ -858,6 +858,12 @@ no_context(struct pt_regs *regs, unsigned long error_code, > > > /* Executive summary in case the body of the oops scrolled away */ > > > printk(KERN_DEFAULT "CR2: %016lx\n", address); > > > > > > + /* > > > + * We're about to oops, which might kill the task. Make sure we're > > > + * allowed to sleep. > > > + */ > > > + flags |= X86_EFLAGS_IF; > > > + > > > oops_end(flags, regs, sig); > > > } > > > > > > > > > NAK. If there's a bug in rewind_stack_do_exit(), please fix it in > > rewind_stack_do_exit(). > > [I trimmed the CC list since google rejected it with E2BIG :)] > > I guess the problem is really that do_exit() (or really > exit_signals()) might sleep. Maybe we should put an irq_enable() at > the beginning of do_exit() instead and fix this problem for all > arches? > Hmm. do_exit() isn't really meant to be "try your best to leave the system somewhat usable without returning" -- it's a function that, other than in OOPSes, is called from a well-defined state. So I think rewind_stack_do_exit() is probably a better spot. But we need to rewind the stack and *then* turn on IRQs, since we otherwise risk exploding quite badly.
On Wed, Apr 03, 2019 at 09:12:16PM -0700, Andy Lutomirski wrote: > On Wed, Apr 3, 2019 at 6:42 PM Tycho Andersen <tycho@tycho.ws> wrote: > > > > On Wed, Apr 03, 2019 at 05:12:56PM -0700, Andy Lutomirski wrote: > > > On Wed, Apr 3, 2019 at 10:36 AM Khalid Aziz <khalid.aziz@oracle.com> wrote: > > > > > > > > From: Tycho Andersen <tycho@tycho.ws> > > > > > > > > Oopsing might kill the task, via rewind_stack_do_exit() at the bottom, and > > > > that might sleep: > > > > > > > > > > > > > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > > > > index 9d5c75f02295..7891add0913f 100644 > > > > --- a/arch/x86/mm/fault.c > > > > +++ b/arch/x86/mm/fault.c > > > > @@ -858,6 +858,12 @@ no_context(struct pt_regs *regs, unsigned long error_code, > > > > /* Executive summary in case the body of the oops scrolled away */ > > > > printk(KERN_DEFAULT "CR2: %016lx\n", address); > > > > > > > > + /* > > > > + * We're about to oops, which might kill the task. Make sure we're > > > > + * allowed to sleep. > > > > + */ > > > > + flags |= X86_EFLAGS_IF; > > > > + > > > > oops_end(flags, regs, sig); > > > > } > > > > > > > > > > > > > NAK. If there's a bug in rewind_stack_do_exit(), please fix it in > > > rewind_stack_do_exit(). > > > > [I trimmed the CC list since google rejected it with E2BIG :)] > > > > I guess the problem is really that do_exit() (or really > > exit_signals()) might sleep. Maybe we should put an irq_enable() at > > the beginning of do_exit() instead and fix this problem for all > > arches? > > > > Hmm. do_exit() isn't really meant to be "try your best to leave the > system somewhat usable without returning" -- it's a function that, > other than in OOPSes, is called from a well-defined state. So I think > rewind_stack_do_exit() is probably a better spot. But we need to > rewind the stack and *then* turn on IRQs, since we otherwise risk > exploding quite badly. Ok, sounds good. I guess we can include something like this patch in the next series. Thanks, Tycho From 34dce229a4f43f90db823671eb0b8da7c4906045 Mon Sep 17 00:00:00 2001 From: Tycho Andersen <tycho@tycho.ws> Date: Thu, 4 Apr 2019 09:41:32 -0600 Subject: [PATCH] x86/entry: re-enable interrupts before exiting If the kernel oopses in an interrupt, nothing re-enables interrupts: Aug 23 19:30:27 xpfo kernel: [ 38.302714] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:33 Aug 23 19:30:27 xpfo kernel: [ 38.303837] in_atomic(): 0, irqs_disabled(): 1, pid: 1970, name: lkdtm_xpfo_test Aug 23 19:30:27 xpfo kernel: [ 38.304758] CPU: 3 PID: 1970 Comm: lkdtm_xpfo_test Tainted: G D 4.13.0-rc5+ #228 Aug 23 19:30:27 xpfo kernel: [ 38.305813] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014 Aug 23 19:30:27 xpfo kernel: [ 38.306926] Call Trace: Aug 23 19:30:27 xpfo kernel: [ 38.307243] dump_stack+0x63/0x8b Aug 23 19:30:27 xpfo kernel: [ 38.307665] ___might_sleep+0xec/0x110 Aug 23 19:30:27 xpfo kernel: [ 38.308139] __might_sleep+0x45/0x80 Aug 23 19:30:27 xpfo kernel: [ 38.308593] exit_signals+0x21/0x1c0 Aug 23 19:30:27 xpfo kernel: [ 38.309046] ? blocking_notifier_call_chain+0x11/0x20 Aug 23 19:30:27 xpfo kernel: [ 38.309677] do_exit+0x98/0xbf0 Aug 23 19:30:27 xpfo kernel: [ 38.310078] ? smp_reader+0x27/0x40 [lkdtm] Aug 23 19:30:27 xpfo kernel: [ 38.310604] ? kthread+0x10f/0x150 Aug 23 19:30:27 xpfo kernel: [ 38.311045] ? read_user_with_flags+0x60/0x60 [lkdtm] Aug 23 19:30:27 xpfo kernel: [ 38.311680] rewind_stack_do_exit+0x17/0x20 do_exit() expects to be called in a well-defined environment, so let's re-enable interrupts after unwinding the stack, in case they were disabled. Signed-off-by: Tycho Andersen <tycho@tycho.ws> --- arch/x86/entry/entry_32.S | 6 ++++++ arch/x86/entry/entry_64.S | 6 ++++++ 2 files changed, 12 insertions(+) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index d309f30cf7af..8ddb7b41669d 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -1507,6 +1507,12 @@ ENTRY(rewind_stack_do_exit) movl PER_CPU_VAR(cpu_current_top_of_stack), %esi leal -TOP_OF_KERNEL_STACK_PADDING-PTREGS_SIZE(%esi), %esp + /* + * If we oopsed in an interrupt handler, interrupts may be off. Let's turn + * them back on before going back to "normal" code. + */ + sti + call do_exit 1: jmp 1b END(rewind_stack_do_exit) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 1f0efdb7b629..c0759f3e3ad2 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1672,5 +1672,11 @@ ENTRY(rewind_stack_do_exit) leaq -PTREGS_SIZE(%rax), %rsp UNWIND_HINT_FUNC sp_offset=PTREGS_SIZE + /* + * If we oopsed in an interrupt handler, interrupts may be off. Let's turn + * them back on before going back to "normal" code. + */ + sti + call do_exit END(rewind_stack_do_exit)
- stepping on del button while browsing though CCs. On 2019-04-04 09:47:27 [-0600], Tycho Andersen wrote: > > Hmm. do_exit() isn't really meant to be "try your best to leave the > > system somewhat usable without returning" -- it's a function that, > > other than in OOPSes, is called from a well-defined state. So I think > > rewind_stack_do_exit() is probably a better spot. But we need to > > rewind the stack and *then* turn on IRQs, since we otherwise risk > > exploding quite badly. > > Ok, sounds good. I guess we can include something like this patch in > the next series. The tracing infrastructure probably doesn't know that the interrupts are back on. Also if you were holding a spin lock then your preempt count isn't 0 which means that might_sleep() will trigger a splat (in your backtrace it was zero). > Thanks, > > Tycho Sebastian
On Thu, 4 Apr 2019, Tycho Andersen wrote: > leaq -PTREGS_SIZE(%rax), %rsp > UNWIND_HINT_FUNC sp_offset=PTREGS_SIZE > > + /* > + * If we oopsed in an interrupt handler, interrupts may be off. Let's turn > + * them back on before going back to "normal" code. > + */ > + sti That breaks the paravirt muck and tracing/lockdep. ENABLE_INTERRUPTS() is what you want plus TRACE_IRQ_ON to keep the tracer and lockdep happy. Thanks, tglx
> On Apr 4, 2019, at 10:28 AM, Thomas Gleixner <tglx@linutronix.de> wrote: > >> On Thu, 4 Apr 2019, Tycho Andersen wrote: >> leaq -PTREGS_SIZE(%rax), %rsp >> UNWIND_HINT_FUNC sp_offset=PTREGS_SIZE >> >> + /* >> + * If we oopsed in an interrupt handler, interrupts may be off. Let's turn >> + * them back on before going back to "normal" code. >> + */ >> + sti > > That breaks the paravirt muck and tracing/lockdep. > > ENABLE_INTERRUPTS() is what you want plus TRACE_IRQ_ON to keep the tracer > and lockdep happy. > > I’m sure we’ll find some other thing we forgot to reset eventually, so let’s do this in C. Change the call do_exit to call __finish_rewind_stack_do_exit and add the latter as a C function that does local_irq_enable() and do_exit().
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 9d5c75f02295..7891add0913f 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -858,6 +858,12 @@ no_context(struct pt_regs *regs, unsigned long error_code, /* Executive summary in case the body of the oops scrolled away */ printk(KERN_DEFAULT "CR2: %016lx\n", address); + /* + * We're about to oops, which might kill the task. Make sure we're + * allowed to sleep. + */ + flags |= X86_EFLAGS_IF; + oops_end(flags, regs, sig); }