Message ID | 20151203110041.GK8644@n2100.arm.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Russell King wrote: > On Thu, Dec 03, 2015 at 08:33:13AM +0000, Peter Rosin wrote: > > I wrote: > > > If I enable CONFIG_CPU_SW_DOMAIN_PAN, I sometimes (but not always) get the > > > following (or very similar) on boot. > > > > I should have said "if I don't disable", as the option is "default y". > > > > Also, if it survives on boot, below is an example of later trouble (after > > 30+ minutes on this occasion). > > Please apply this patch so we (might) get a slightly better oops dump, > and then try to reproduce. Sure thing, but it's still "DAC: 00000051"... Cheers, Peter Unhandled fault: page domain fault (0x81b) at 0x00086578 pgd = c280c000 [00086578] *pgd=22f77831, *pte=2347034f, *ppte=2347083f Internal error: : 81b [#1] ARM Modules linked in: CPU: 0 PID: 904 Comm: ntpd Not tainted 4.3.0+ #30 Hardware name: Atmel SAMA5 task: c39bb500 ti: c2fca000 task.ti: c2fca000 PC is at memcpy+0x50/0x330 LR is at 0x1 pc : [<c01daff0>] lr : [<00000001>] psr: 800f0013 sp : c2fcbed4 ip : 00000007 fp : 000d3808 r10: 00000000 r9 : 00000080 r8 : 00000001 r7 : 00000010 r6 : 00000010 r5 : 003f5c28 r4 : 00000000 r3 : 00000002 r2 : ffffffe0 r1 : c2fcbf38 r0 : 00086578 Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c53c7d Table: 2280c059 DAC: 00000051 Process ntpd (pid: 904, stack limit = 0xc2fca208) Stack: (0xc2fcbed4 to 0xc2fcc000) bec0: 00000000 c2fca000 00000000 bee0: c2fcbf18 00086578 00086578 c01e76c4 c2f77218 c2ff70f4 000d4344 00086578 bf00: 00000000 00000051 0000007c c0010224 c2fca000 c004c988 00000002 00000000 bf20: 003f5c28 00000010 00000010 00000001 00000007 00000001 01f40000 5660283b bf40: 000ae9dc 00002710 00000000 00000000 00000000 00000000 00000000 00000000 bf60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bf80: 00000000 00000000 00000000 00000000 00000000 00000000 000d3968 0005e38f bfa0: 000d4344 c0010060 000d3968 0005e38f 00086578 00000000 ffffffff 00000001 bfc0: 000d3968 0005e38f 000d4344 0000007c 000d3814 00000001 000d4338 000d3808 bfe0: 00081e7c bec3e8b4 00026d3d b6ced126 00000030 00086578 8499c1b5 48743d49 [<c01daff0>] (memcpy) from [<c01e76c4>] (__copy_to_user_memcpy+0x138/0x17c) [<c01e76c4>] (__copy_to_user_memcpy) from [<c004c988>] (SyS_adjtimex+0xd4/0xf0) [<c004c988>] (SyS_adjtimex) from [<c0010060>] (ret_fast_syscall+0x0/0x3c) Code: f5d1f05c f5d1f07c e8b151f8 e2522020 (e8a051f8) ---[ end trace bd9256bb17081c58 ]---
On Thu, Dec 03, 2015 at 11:38:20AM +0000, Peter Rosin wrote: > Russell King wrote: > > On Thu, Dec 03, 2015 at 08:33:13AM +0000, Peter Rosin wrote: > > > I wrote: > > > > If I enable CONFIG_CPU_SW_DOMAIN_PAN, I sometimes (but not always) get the > > > > following (or very similar) on boot. > > > > > > I should have said "if I don't disable", as the option is "default y". > > > > > > Also, if it survives on boot, below is an example of later trouble (after > > > 30+ minutes on this occasion). > > > > Please apply this patch so we (might) get a slightly better oops dump, > > and then try to reproduce. > > Sure thing, but it's still "DAC: 00000051"... Thanks, that confirms that something in the uaccess-with-memcpy code is clearing the DACR back to 0x51. This has come up several times before, and I really can't spot the problem in this code, so I've always said to disable the uaccess-with-memcpy code. Personally, I'd like to see the back of that code...
Russell King wrote: > On Thu, Dec 03, 2015 at 11:38:20AM +0000, Peter Rosin wrote: > > Russell King wrote: > > > On Thu, Dec 03, 2015 at 08:33:13AM +0000, Peter Rosin wrote: > > > > I wrote: > > > > > If I enable CONFIG_CPU_SW_DOMAIN_PAN, I sometimes (but not always) get the > > > > > following (or very similar) on boot. > > > > > > > > I should have said "if I don't disable", as the option is "default y". > > > > > > > > Also, if it survives on boot, below is an example of later trouble (after > > > > 30+ minutes on this occasion). > > > > > > Please apply this patch so we (might) get a slightly better oops dump, > > > and then try to reproduce. > > > > Sure thing, but it's still "DAC: 00000051"... > > Thanks, that confirms that something in the uaccess-with-memcpy code > is clearing the DACR back to 0x51. > > This has come up several times before, and I really can't spot the > problem in this code, so I've always said to disable the > uaccess-with-memcpy code. Personally, I'd like to see the back > of that code... Ok, but it's not me doing crazy things if that's what you are implying (not saying that you are), because the sama5_defconfig has CONFIG_UACCESS_WITH_MEMCPY=y So, something needs to happen or sama5 remains default-broken. Cheers, Peter
On Thu, Dec 03, 2015 at 12:08:11PM +0000, Peter Rosin wrote: > Russell King wrote: > > On Thu, Dec 03, 2015 at 11:38:20AM +0000, Peter Rosin wrote: > > > Russell King wrote: > > > > On Thu, Dec 03, 2015 at 08:33:13AM +0000, Peter Rosin wrote: > > > > > I wrote: > > > > > > If I enable CONFIG_CPU_SW_DOMAIN_PAN, I sometimes (but not always) get the > > > > > > following (or very similar) on boot. > > > > > > > > > > I should have said "if I don't disable", as the option is "default y". > > > > > > > > > > Also, if it survives on boot, below is an example of later trouble (after > > > > > 30+ minutes on this occasion). > > > > > > > > Please apply this patch so we (might) get a slightly better oops dump, > > > > and then try to reproduce. > > > > > > Sure thing, but it's still "DAC: 00000051"... > > > > Thanks, that confirms that something in the uaccess-with-memcpy code > > is clearing the DACR back to 0x51. > > > > This has come up several times before, and I really can't spot the > > problem in this code, so I've always said to disable the > > uaccess-with-memcpy code. Personally, I'd like to see the back > > of that code... > > Ok, but it's not me doing crazy things if that's what you are implying > (not saying that you are), because the sama5_defconfig has > > CONFIG_UACCESS_WITH_MEMCPY=y > > So, something needs to happen or sama5 remains default-broken. I have no solution for this, other than saying that uaccess-with-memcpy seems to be (for some unknown reason) incompatible with SW PAN. Out of the two features, I'd go for SW PAN over uaccess-with-memcpy, but others will have a different opinion. The real solution is to track down what's going on and why, but I don't think anyone has the motivation to do that. I've looked into this problem, and I've been unable to identify what's going on here. I can see no reason for the DACR to be set to 0x51 in this code. The entry path into this code is via __copy_to_user(), which saves and sets the DACR to 0x55 before calling arm_copy_to_user(), which for uaccess-with-memcpy() is in arch/arm/lib/uaccess_with_memcpy.c. That tail-calls into __copy_to_user_memcpy(), which should run with the DACR set to 0x55. The only place that the DACR is changed is inside __put_user(), which saves the DACR before also setting it to 0x55, and then restores the old DACR value, which, because the old value was 0x55, will have no effect. So, I can see no way that the DACR should ever be 0x51 inside __copy_to_user_memcpy(), but you are seeing such a scenario. I've no idea how you could ever get a value of 0x51 out of the DACR within this code.
Russell King wrote: > On Thu, Dec 03, 2015 at 12:08:11PM +0000, Peter Rosin wrote: > > Russell King wrote: > > > On Thu, Dec 03, 2015 at 11:38:20AM +0000, Peter Rosin wrote: > > > > Russell King wrote: > > > > > On Thu, Dec 03, 2015 at 08:33:13AM +0000, Peter Rosin wrote: > > > > > > I wrote: > > > > > > > If I enable CONFIG_CPU_SW_DOMAIN_PAN, I sometimes (but not always) get the > > > > > > > following (or very similar) on boot. > > > > > > > > > > > > I should have said "if I don't disable", as the option is "default y". > > > > > > > > > > > > Also, if it survives on boot, below is an example of later trouble (after > > > > > > 30+ minutes on this occasion). > > > > > > > > > > Please apply this patch so we (might) get a slightly better oops dump, > > > > > and then try to reproduce. > > > > > > > > Sure thing, but it's still "DAC: 00000051"... > > > > > > Thanks, that confirms that something in the uaccess-with-memcpy code > > > is clearing the DACR back to 0x51. > > > > > > This has come up several times before, and I really can't spot the > > > problem in this code, so I've always said to disable the > > > uaccess-with-memcpy code. Personally, I'd like to see the back > > > of that code... > > > > Ok, but it's not me doing crazy things if that's what you are implying > > (not saying that you are), because the sama5_defconfig has > > > > CONFIG_UACCESS_WITH_MEMCPY=y > > > > So, something needs to happen or sama5 remains default-broken. > > I have no solution for this, other than saying that uaccess-with-memcpy > seems to be (for some unknown reason) incompatible with SW PAN. > > Out of the two features, I'd go for SW PAN over uaccess-with-memcpy, > but others will have a different opinion. The real solution is to > track down what's going on and why, but I don't think anyone has the > motivation to do that. > > I've looked into this problem, and I've been unable to identify what's > going on here. I can see no reason for the DACR to be set to 0x51 in > this code. > > The entry path into this code is via __copy_to_user(), which saves and > sets the DACR to 0x55 before calling arm_copy_to_user(), which for > uaccess-with-memcpy() is in arch/arm/lib/uaccess_with_memcpy.c. That > tail-calls into __copy_to_user_memcpy(), which should run with the > DACR set to 0x55. > > The only place that the DACR is changed is inside __put_user(), which > saves the DACR before also setting it to 0x55, and then restores the > old DACR value, which, because the old value was 0x55, will have no > effect. > > So, I can see no way that the DACR should ever be 0x51 inside > __copy_to_user_memcpy(), but you are seeing such a scenario. I've no > idea how you could ever get a value of 0x51 out of the DACR within this > code. Since it seems like a race is at the bottom of the observed problems, I'm going to look for things that look racy. The things that stand out to me are: * uaccess.h:modify_domain() does a read-modify-write on DACR using get_domain and set_domain, and I don't see any locking. Is that safe? Why? * uaccess_with_memcpy.c:__copy_to_user() has a mode in which it copies "non-atomically" (if faulthandler_disabled() returns 0). If a fault happens during __copy_to_user, what prevents some other thread from clobbering DACR? * In uaccess.h:uaccess_save_and_enable(), what prevents a context switch between the get_domain and set_domain calls? Just asking questions, I have no prior experience with this code... Cheers, Peter
On Thu, Dec 03, 2015 at 04:12:06PM +0000, Peter Rosin wrote: > Since it seems like a race is at the bottom of the observed problems, I'm > going to look for things that look racy. The things that stand out to me > are: > > * uaccess.h:modify_domain() does a read-modify-write on DACR using > get_domain and set_domain, and I don't see any locking. Is that > safe? Why? It's safe: * the DACR is per-CPU * all exceptions preserve the original DACR value when they return. This is done by storing the DACR value at entry onto the stack, along with the register set, and restoring it along with the register set on exit from exception processing, as if "nothing ever happened". This includes if the exception processing caused a switch to another thread. > * uaccess_with_memcpy.c:__copy_to_user() has a mode in which it copies > "non-atomically" (if faulthandler_disabled() returns 0). If a fault > happens during __copy_to_user, what prevents some other thread from > clobbering DACR? See the second point above. Moreover, if we sleep in down_read(), then __switch_to() reads the current DACR value and saves it in the thread information, and will restore that value when resuming the thread - even if the thread has been migrated to a different CPU. > * In uaccess.h:uaccess_save_and_enable(), what prevents a context > switch between the get_domain and set_domain calls? Nothing, but it doesn't matter, because the DACR register is saved and restored to preserve its value across all exceptions and thread switches. I suspect the only way to nail this down is to litter the uaccess code (virtually every alternate line) with: BUG_ON(get_domain() & domain_mask(DOMAIN_USER) == domain_val(DOMAIN_USER, DOMAIN_NOACCESS)); to narrow down the exact point where the domain register seemingly gets reset. Maybe it'll provide some hint.
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c index 7a7c4ce..92789ce 100644 --- a/arch/arm/kernel/process.c +++ b/arch/arm/kernel/process.c @@ -95,6 +95,22 @@ void __show_regs(struct pt_regs *regs) { unsigned long flags; char buf[64]; +#ifndef CONFIG_CPU_V7M + unsigned int domain; +#ifdef CONFIG_CPU_SW_DOMAIN_PAN + /* + * Get the domain register for the parent context. In user + * mode, we don't save the DACR, so lets use what it should + * be. For other modes, we place it after the pt_regs struct. + */ + if (user_mode(regs)) + domain = DACR_UACCESS_ENABLE; + else + domain = *(unsigned int *)(regs + 1); +#else + domain = get_domain(); +#endif +#endif show_regs_print_info(KERN_DEFAULT); @@ -123,21 +139,8 @@ void __show_regs(struct pt_regs *regs) #ifndef CONFIG_CPU_V7M { - unsigned int domain = get_domain(); const char *segment; -#ifdef CONFIG_CPU_SW_DOMAIN_PAN - /* - * Get the domain register for the parent context. In user - * mode, we don't save the DACR, so lets use what it should - * be. For other modes, we place it after the pt_regs struct. - */ - if (user_mode(regs)) - domain = DACR_UACCESS_ENABLE; - else - domain = *(unsigned int *)(regs + 1); -#endif - if ((domain & domain_mask(DOMAIN_USER)) == domain_val(DOMAIN_USER, DOMAIN_NOACCESS)) segment = "none"; @@ -163,11 +166,11 @@ void __show_regs(struct pt_regs *regs) buf[0] = '\0'; #ifdef CONFIG_CPU_CP15_MMU { - unsigned int transbase, dac = get_domain(); + unsigned int transbase; asm("mrc p15, 0, %0, c2, c0\n\t" : "=r" (transbase)); snprintf(buf, sizeof(buf), " Table: %08x DAC: %08x", - transbase, dac); + transbase, domain); } #endif asm("mrc p15, 0, %0, c1, c0\n" : "=r" (ctrl));