Message ID | 20190215200533.ypfrdekg7j4ucu6a@linutronix.de (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | [RFC] ARM: enable irq in translation/section permission fault handlers | expand |
On 2/15/19 9:05 PM, Sebastian Andrzej Siewior wrote: > From: "Yadi.hu" <yadi.hu@windriver.com> > > Accessing a kernel address in user space causes a SIGSEGV which is sent > via > -> do_DataAbort > -> do_sect_fault || do_translation_fault > -> do_bad_area > -> __do_user_fault > -> force_sig_fault > Since commit > > 02fe2845d6a83 ("ARM: entry: avoid enabling interrupts in prefetch/data abort handlers") > > that path is carried out with disabled interrupts. Page/alignment fault > do enable interrupts but data abort has been left out. > > On -RT the siglock is a sleeping spinlock and requires interrupts to be > enabled in order to acquire it. > > Enable interrupts in the DataAbort handler if the parent context had > interrupts enabled. Move harden_branch_predictor() before interrupts are > enabled. > > Reported-by: <Bernd Edlinger <bernd.edlinger@hotmail.de> > Signed-off-by: Yadi.hu <yadi.hu@windriver.com> > [bigeasy: rewrote patch description, reordered patch] > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- > I though that the "interrupt enable part" has already been posted and > then Bernd complained about a warning from harden_branch_predictor() on > -RT so here it is. > > arch/arm/mm/fault.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c > index 58f69fa07df95..da82967865836 100644 > --- a/arch/arm/mm/fault.c > +++ b/arch/arm/mm/fault.c > @@ -161,8 +161,6 @@ __do_user_fault(struct task_struct *tsk, unsigned long addr, > unsigned int fsr, unsigned int sig, int code, > struct pt_regs *regs) > { > - if (addr > TASK_SIZE) > - harden_branch_predictor(); > > #ifdef CONFIG_DEBUG_USER > if (((user_debug & UDBG_SEGV) && (sig == SIGSEGV)) || > @@ -191,6 +189,11 @@ void do_bad_area(unsigned long addr, unsigned int fsr, struct pt_regs *regs) > struct task_struct *tsk = current; > struct mm_struct *mm = tsk->active_mm; > > + if (addr > TASK_SIZE && user_mode(regs)) > + harden_branch_predictor(); This is somehow inconsisten with do_translation_fault, where we have this: if (addr < TASK_SIZE) return do_page_fault(addr, fsr, regs); > + > + if (interrupts_enabled(regs)) > + local_irq_enable(); > /* > * If we are in kernel mode at this point, we > * have no context to handle this fault with. > I have seen three different failure modes, pleas see the first 3 calls stacks here: https://marc.info/?l=linux-rt-users&m=155016888714927&w=2 I am concerned about this fist issue, because it removes the branch predictor hardening after the do_page_fault has executed: do_DataAbort->do_page_fault(addr>TASK_SIZE)->__do_user_fault This is reachable because do_page_fault is not only called from do_translation_fault but also from here: arch/arm/mm/fsr-2level.c and here: arch/arm/mm/fsr-3level.c those are callable with addr > TASK_SIZE And the following code path does enable the hard irqs before do_bad_area: do_DataAbort->do_sect_fault->do_bad_area->__do_user_fault So this function, would need to be rewritten: do_sect_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) { if (interrupts_enabled(regs)) local_irq_enable(); do_bad_area(addr, fsr, regs); return 0; } BTW, the complete linux tree I am testing right now is here: https://github.com/bernd-edlinger/linux-rt in the branch: patches-rel-v4.14-rt Thanks Bernd.
On Fri, Feb 15, 2019 at 09:05:33PM +0100, Sebastian Andrzej Siewior wrote: > From: "Yadi.hu" <yadi.hu@windriver.com> > > Accessing a kernel address in user space causes a SIGSEGV which is sent > via > -> do_DataAbort > -> do_sect_fault || do_translation_fault > -> do_bad_area > -> __do_user_fault > -> force_sig_fault > Since commit > > 02fe2845d6a83 ("ARM: entry: avoid enabling interrupts in prefetch/data abort handlers") > > that path is carried out with disabled interrupts. Page/alignment fault > do enable interrupts but data abort has been left out. > > On -RT the siglock is a sleeping spinlock and requires interrupts to be > enabled in order to acquire it. > > Enable interrupts in the DataAbort handler if the parent context had > interrupts enabled. Move harden_branch_predictor() before interrupts are > enabled. One very very big NAK. You haven't given any reasoning for moving harden_branch_predictor(). Moving it in this way, you are re-opening systems up for branch predictor attacks, since there is now a path through the data abort handler where userspace can attempt to access kernel space, resulting in a fault being generated, but the branch predictor hardening is no longer called. > Reported-by: <Bernd Edlinger <bernd.edlinger@hotmail.de> > Signed-off-by: Yadi.hu <yadi.hu@windriver.com> > [bigeasy: rewrote patch description, reordered patch] > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- > I though that the "interrupt enable part" has already been posted and > then Bernd complained about a warning from harden_branch_predictor() on > -RT so here it is. > > arch/arm/mm/fault.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c > index 58f69fa07df95..da82967865836 100644 > --- a/arch/arm/mm/fault.c > +++ b/arch/arm/mm/fault.c > @@ -161,8 +161,6 @@ __do_user_fault(struct task_struct *tsk, unsigned long addr, > unsigned int fsr, unsigned int sig, int code, > struct pt_regs *regs) > { > - if (addr > TASK_SIZE) > - harden_branch_predictor(); > > #ifdef CONFIG_DEBUG_USER > if (((user_debug & UDBG_SEGV) && (sig == SIGSEGV)) || > @@ -191,6 +189,11 @@ void do_bad_area(unsigned long addr, unsigned int fsr, struct pt_regs *regs) > struct task_struct *tsk = current; > struct mm_struct *mm = tsk->active_mm; > > + if (addr > TASK_SIZE && user_mode(regs)) > + harden_branch_predictor(); > + > + if (interrupts_enabled(regs)) > + local_irq_enable(); > /* > * If we are in kernel mode at this point, we > * have no context to handle this fault with. > -- > 2.20.1 >
On 2019-02-15 21:57:56 [+0000], Bernd Edlinger wrote: > > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c > > index 58f69fa07df95..da82967865836 100644 > > --- a/arch/arm/mm/fault.c > > +++ b/arch/arm/mm/fault.c > > @@ -161,8 +161,6 @@ __do_user_fault(struct task_struct *tsk, unsigned long addr, > > unsigned int fsr, unsigned int sig, int code, > > struct pt_regs *regs) > > { > > - if (addr > TASK_SIZE) > > - harden_branch_predictor(); > > > > #ifdef CONFIG_DEBUG_USER > > if (((user_debug & UDBG_SEGV) && (sig == SIGSEGV)) || > > @@ -191,6 +189,11 @@ void do_bad_area(unsigned long addr, unsigned int fsr, struct pt_regs *regs) > > struct task_struct *tsk = current; > > struct mm_struct *mm = tsk->active_mm; > > > > + if (addr > TASK_SIZE && user_mode(regs)) > > + harden_branch_predictor(); > > This is somehow inconsisten with do_translation_fault, where > we have this: > > if (addr < TASK_SIZE) > return do_page_fault(addr, fsr, regs); yes but harden_branch_predictor() is only invoked for addr > TASK_SIZE. What do I miss? > > + > > + if (interrupts_enabled(regs)) > > + local_irq_enable(); > > /* > > * If we are in kernel mode at this point, we > > * have no context to handle this fault with. > > > > I have seen three different failure modes, pleas see the first 3 calls stacks > here: https://marc.info/?l=linux-rt-users&m=155016888714927&w=2 yes, but this is only with the one patch in RT. So you should not see this without the RT patch. > I am concerned about this fist issue, because it removes the branch > predictor hardening after the do_page_fault has executed: > > do_DataAbort->do_page_fault(addr>TASK_SIZE)->__do_user_fault > > This is reachable because do_page_fault is not only called from > do_translation_fault but also from here: arch/arm/mm/fsr-2level.c > and here: arch/arm/mm/fsr-3level.c > those are callable with addr > TASK_SIZE okay. So 0xbffffff0 without LPAE would be left out. I wasn't ware of that. And this indeed it hits the warning. > And the following code path does enable the hard irqs before do_bad_area: > do_DataAbort->do_sect_fault->do_bad_area->__do_user_fault > > So this function, would need to be rewritten: > > do_sect_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) > { > if (interrupts_enabled(regs)) > local_irq_enable(); > > do_bad_area(addr, fsr, regs); > return 0; > } We would need to move the branch predictor piece before enabling interrupts. > > Thanks > Bernd. Sebastian
On 2019-02-16 11:33:38 [+0000], Russell King - ARM Linux admin wrote: > On Fri, Feb 15, 2019 at 09:05:33PM +0100, Sebastian Andrzej Siewior wrote: > > From: "Yadi.hu" <yadi.hu@windriver.com> > > > > Accessing a kernel address in user space causes a SIGSEGV which is sent > > via > > -> do_DataAbort > > -> do_sect_fault || do_translation_fault > > -> do_bad_area > > -> __do_user_fault > > -> force_sig_fault > > Since commit > > > > 02fe2845d6a83 ("ARM: entry: avoid enabling interrupts in prefetch/data abort handlers") > > > > that path is carried out with disabled interrupts. Page/alignment fault > > do enable interrupts but data abort has been left out. > > > > On -RT the siglock is a sleeping spinlock and requires interrupts to be > > enabled in order to acquire it. > > > > Enable interrupts in the DataAbort handler if the parent context had > > interrupts enabled. Move harden_branch_predictor() before interrupts are > > enabled. > > One very very big NAK. > > You haven't given any reasoning for moving harden_branch_predictor(). if do_bad_area() opens interrupts then harden_branch_predictor() will complain via smp_processor_id(). So it looked obvious to move harden_branch_predictor() before interrupts are enabled. > Moving it in this way, you are re-opening systems up for branch > predictor attacks, since there is now a path through the data abort > handler where userspace can attempt to access kernel space, resulting > in a fault being generated, but the branch predictor hardening is no > longer called. Bernd Edlinger explained the missing piece to me. With PAGE_OFFSET=0xC0000000 and without LPAE a R/W of 0xbffffff0 will ends up with: | ~/mem-tc 0xbffffff0 1 |Write 0xbffffff0 |BUG: using smp_processor_id() in preemptible [00000000] code: mem-tc/706 |caller is __do_user_fault.constprop.2+0x78/0xa4 |CPU: 3 PID: 706 Comm: mem-tc Tainted: G W 5.0.0-rc6-00001-g625f719a23bda-dirty #35 |Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) |[<c01123d4>] (unwind_backtrace) from [<c010cc58>] (show_stack+0x10/0x14) |[<c010cc58>] (show_stack) from [<c0a40988>] (dump_stack+0x78/0x8c) |[<c0a40988>] (dump_stack) from [<c045427c>] (check_preemption_disabled+0x100/0x11c) |[<c045427c>] (check_preemption_disabled) from [<c01164e4>] (__do_user_fault.constprop.2+0x78/0xa4) |[<c01164e4>] (__do_user_fault.constprop.2) from [<c01168fc>] (do_page_fault+0x378/0x72c) |[<c01168fc>] (do_page_fault) from [<c0116e24>] (do_DataAbort+0x3c/0xa8) |[<c0116e24>] (do_DataAbort) from [<c0101d9c>] (__dabt_usr+0x3c/0x40) |Exception stack(0xec899fb0 to 0xec899ff8) |9fa0: 00000000 00000000 00000001 12345678 |9fc0: 00000000 004f7000 bffffff0 bffffff0 bedc9df4 bedc9c7c 004f7000 00000000 |9fe0: 00000000 bedc9c78 004e6559 004e6562 60010030 ffffffff |Segmentation fault So should harden_branch_predictor() be invoked before interrupts are enabled? Sebastian
On 2/20/19 12:00 PM, Sebastian Andrzej Siewior wrote: > > Bernd Edlinger explained the missing piece to me. With > PAGE_OFFSET=0xC0000000 and without LPAE a R/W of 0xbffffff0 will ends up > with: > I am unable to reproduce on my target, wheter do_page_fault is directly taken, or do_translation_fault is taken depends on whether the page table root directory permission bits deny the access or the second level page table deny the access? But if there is a way to get into the code path do_DataAbort->do_page_fault with addr > TASK_SIZE, then I don't see why that works without the RT patch, since the interrupts are enabled here: do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) { struct task_struct *tsk; struct mm_struct *mm; int sig, code; vm_fault_t fault; unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; if (notify_page_fault(regs, fsr)) return 0; tsk = current; mm = tsk->mm; /* Enable interrupts if they were enabled in the parent context. */ if (interrupts_enabled(regs)) local_irq_enable(); Is preemption disabled at this point, without the RT patch? I still don't quite see why is this no issue without the RT patch. Can someone explain that? Thanks Bernd.
> But if there is a way to get into the code path do_DataAbort->do_page_fault with > addr > TASK_SIZE, then I don't see why that works without the RT patch, > since the interrupts are enabled here: > > do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) > { > struct task_struct *tsk; > struct mm_struct *mm; > int sig, code; > vm_fault_t fault; > unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; > > if (notify_page_fault(regs, fsr)) > return 0; > > tsk = current; > mm = tsk->mm; > > /* Enable interrupts if they were enabled in the parent context. */ > if (interrupts_enabled(regs)) > local_irq_enable(); > > Is preemption disabled at this point, without the RT patch? > > I still don't quite see why is this no issue without the RT patch. > Can someone explain that? Never mind, now I found out how to prove that it is not a RT issue: $ cat t2.c int main() { *((char*)0xFFFFFFFF) = 1; } $ gcc t2.c -o t2 $ ./t2 BUG: using smp_processor_id() in preemptible [00000000] code: t2/832 caller is debug_smp_processor_id+0x18/0x24 CPU: 1 PID: 832 Comm: t2 Not tainted 4.18.0-00005-g7523d7f #1 Hardware name: Altera SOCFPGA [<80112e68>] (unwind_backtrace) from [<8010da28>] (show_stack+0x20/0x24) [<8010da28>] (show_stack) from [<806bcaec>] (dump_stack+0x78/0x94) [<806bcaec>] (dump_stack) from [<8043f2d0>] (check_preemption_disabled+0xe4/0x120) [<8043f2d0>] (check_preemption_disabled) from [<8043f324>] (debug_smp_processor_id+0x18/0x24) [<8043f324>] (debug_smp_processor_id) from [<80116ba0>] (__do_user_fault+0x48/0x13c) [<80116ba0>] (__do_user_fault) from [<8011700c>] (do_page_fault+0x2f4/0x328) [<8011700c>] (do_page_fault) from [<80117200>] (do_DataAbort+0x58/0x100) [<80117200>] (do_DataAbort) from [<80102460>] (__dabt_usr+0x40/0x60) Exception stack(0xbf197fb0 to 0xbf197ff8) 7fa0: 00000001 7e963d94 00000001 ffffffff 7fc0: 000103f8 00000000 000102e0 00000000 00000000 00000000 76fa9000 7e963c3c 7fe0: 00000000 7e963c3c 76e47bbc 000103e0 60060010 ffffffff Segmentation fault What exactly is the reason why we should not apply this patch in the non-RT kernel as well: https://marc.info/?l=linux-rt-users&m=154997565610392&w=2 Thanks Bernd.
On 2019-02-21 09:31:53 [+0000], Bernd Edlinger wrote: > Never mind, now I found out how to prove that it is not a RT issue: > > $ cat t2.c > int main() { > *((char*)0xFFFFFFFF) = 1; > } I've sent another example to in this thread to rmk. > $ gcc t2.c -o t2 > $ ./t2 > BUG: using smp_processor_id() in preemptible [00000000] code: t2/832 > caller is debug_smp_processor_id+0x18/0x24 > CPU: 1 PID: 832 Comm: t2 Not tainted 4.18.0-00005-g7523d7f #1 > Hardware name: Altera SOCFPGA > [<80112e68>] (unwind_backtrace) from [<8010da28>] (show_stack+0x20/0x24) > [<8010da28>] (show_stack) from [<806bcaec>] (dump_stack+0x78/0x94) > [<806bcaec>] (dump_stack) from [<8043f2d0>] (check_preemption_disabled+0xe4/0x120) > [<8043f2d0>] (check_preemption_disabled) from [<8043f324>] (debug_smp_processor_id+0x18/0x24) > [<8043f324>] (debug_smp_processor_id) from [<80116ba0>] (__do_user_fault+0x48/0x13c) > [<80116ba0>] (__do_user_fault) from [<8011700c>] (do_page_fault+0x2f4/0x328) > [<8011700c>] (do_page_fault) from [<80117200>] (do_DataAbort+0x58/0x100) > [<80117200>] (do_DataAbort) from [<80102460>] (__dabt_usr+0x40/0x60) > Exception stack(0xbf197fb0 to 0xbf197ff8) > 7fa0: 00000001 7e963d94 00000001 ffffffff > 7fc0: 000103f8 00000000 000102e0 00000000 00000000 00000000 76fa9000 7e963c3c > 7fe0: 00000000 7e963c3c 76e47bbc 000103e0 60060010 ffffffff > Segmentation fault > > > What exactly is the reason why we should not apply this patch in the non-RT kernel as well: > > https://marc.info/?l=linux-rt-users&m=154997565610392&w=2 I'm not sure that this is the right thing to do. I think you should invoke harden_branch_predictor() before the interrupts are enabled so it is invoked on the same CPU that triggered the exception. > Thanks > Bernd. Sebastian
On 2/21/19 10:57 AM, Sebastian Andrzej Siewior wrote: > On 2019-02-21 09:31:53 [+0000], Bernd Edlinger wrote: >> >> What exactly is the reason why we should not apply this patch in the non-RT kernel as well: >> >> https://marc.info/?l=linux-rt-users&m=154997565610392&w=2 > > I'm not sure that this is the right thing to do. I think you should > invoke harden_branch_predictor() before the interrupts are enabled so it > is invoked on the same CPU that triggered the exception. > When is there any secret information in the branch predictor cache ? When do_DataAbort calls do_translation_fault? Or after do_page_fault has executed, and is about to raise a sig-fault? Doesn't do_page_fault have access to information that an attacker would like to know? Bernd.
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index 58f69fa07df95..da82967865836 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -161,8 +161,6 @@ __do_user_fault(struct task_struct *tsk, unsigned long addr, unsigned int fsr, unsigned int sig, int code, struct pt_regs *regs) { - if (addr > TASK_SIZE) - harden_branch_predictor(); #ifdef CONFIG_DEBUG_USER if (((user_debug & UDBG_SEGV) && (sig == SIGSEGV)) || @@ -191,6 +189,11 @@ void do_bad_area(unsigned long addr, unsigned int fsr, struct pt_regs *regs) struct task_struct *tsk = current; struct mm_struct *mm = tsk->active_mm; + if (addr > TASK_SIZE && user_mode(regs)) + harden_branch_predictor(); + + if (interrupts_enabled(regs)) + local_irq_enable(); /* * If we are in kernel mode at this point, we * have no context to handle this fault with.