diff mbox

Domain faults when CONFIG_CPU_SW_DOMAIN_PAN is enabled

Message ID 20151203110041.GK8644@n2100.arm.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Russell King - ARM Linux Dec. 3, 2015, 11 a.m. UTC
On Thu, Dec 03, 2015 at 08:33:13AM +0000, Peter Rosin wrote:
> I wrote:
> > If I enable CONFIG_CPU_SW_DOMAIN_PAN, I sometimes (but not always) get the
> > following (or very similar) on boot.
> 
> I should have said "if I don't disable", as the option is "default y".
> 
> Also, if it survives on boot, below is an example of later trouble (after
> 30+ minutes on this occasion).

Please apply this patch so we (might) get a slightly better oops dump,
and then try to reproduce.

Comments

Peter Rosin Dec. 3, 2015, 11:38 a.m. UTC | #1
Russell King wrote:
> On Thu, Dec 03, 2015 at 08:33:13AM +0000, Peter Rosin wrote:
> > I wrote:
> > > If I enable CONFIG_CPU_SW_DOMAIN_PAN, I sometimes (but not always) get the
> > > following (or very similar) on boot.
> > 
> > I should have said "if I don't disable", as the option is "default y".
> > 
> > Also, if it survives on boot, below is an example of later trouble (after
> > 30+ minutes on this occasion).
> 
> Please apply this patch so we (might) get a slightly better oops dump,
> and then try to reproduce.

Sure thing, but it's still "DAC: 00000051"...

Cheers,
Peter

Unhandled fault: page domain fault (0x81b) at 0x00086578
pgd = c280c000
[00086578] *pgd=22f77831, *pte=2347034f, *ppte=2347083f
Internal error: : 81b [#1] ARM
Modules linked in:
CPU: 0 PID: 904 Comm: ntpd Not tainted 4.3.0+ #30
Hardware name: Atmel SAMA5
task: c39bb500 ti: c2fca000 task.ti: c2fca000
PC is at memcpy+0x50/0x330
LR is at 0x1
pc : [<c01daff0>]    lr : [<00000001>]    psr: 800f0013
sp : c2fcbed4  ip : 00000007  fp : 000d3808
r10: 00000000  r9 : 00000080  r8 : 00000001
r7 : 00000010  r6 : 00000010  r5 : 003f5c28  r4 : 00000000
r3 : 00000002  r2 : ffffffe0  r1 : c2fcbf38  r0 : 00086578
Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c53c7d  Table: 2280c059  DAC: 00000051
Process ntpd (pid: 904, stack limit = 0xc2fca208)
Stack: (0xc2fcbed4 to 0xc2fcc000)
bec0:                                              00000000 c2fca000 00000000
bee0: c2fcbf18 00086578 00086578 c01e76c4 c2f77218 c2ff70f4 000d4344 00086578
bf00: 00000000 00000051 0000007c c0010224 c2fca000 c004c988 00000002 00000000
bf20: 003f5c28 00000010 00000010 00000001 00000007 00000001 01f40000 5660283b
bf40: 000ae9dc 00002710 00000000 00000000 00000000 00000000 00000000 00000000
bf60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bf80: 00000000 00000000 00000000 00000000 00000000 00000000 000d3968 0005e38f
bfa0: 000d4344 c0010060 000d3968 0005e38f 00086578 00000000 ffffffff 00000001
bfc0: 000d3968 0005e38f 000d4344 0000007c 000d3814 00000001 000d4338 000d3808
bfe0: 00081e7c bec3e8b4 00026d3d b6ced126 00000030 00086578 8499c1b5 48743d49
[<c01daff0>] (memcpy) from [<c01e76c4>] (__copy_to_user_memcpy+0x138/0x17c)
[<c01e76c4>] (__copy_to_user_memcpy) from [<c004c988>] (SyS_adjtimex+0xd4/0xf0)
[<c004c988>] (SyS_adjtimex) from [<c0010060>] (ret_fast_syscall+0x0/0x3c)
Code: f5d1f05c f5d1f07c e8b151f8 e2522020 (e8a051f8)
---[ end trace bd9256bb17081c58 ]---
Russell King - ARM Linux Dec. 3, 2015, 11:51 a.m. UTC | #2
On Thu, Dec 03, 2015 at 11:38:20AM +0000, Peter Rosin wrote:
> Russell King wrote:
> > On Thu, Dec 03, 2015 at 08:33:13AM +0000, Peter Rosin wrote:
> > > I wrote:
> > > > If I enable CONFIG_CPU_SW_DOMAIN_PAN, I sometimes (but not always) get the
> > > > following (or very similar) on boot.
> > > 
> > > I should have said "if I don't disable", as the option is "default y".
> > > 
> > > Also, if it survives on boot, below is an example of later trouble (after
> > > 30+ minutes on this occasion).
> > 
> > Please apply this patch so we (might) get a slightly better oops dump,
> > and then try to reproduce.
> 
> Sure thing, but it's still "DAC: 00000051"...

Thanks, that confirms that something in the uaccess-with-memcpy code
is clearing the DACR back to 0x51.

This has come up several times before, and I really can't spot the
problem in this code, so I've always said to disable the
uaccess-with-memcpy code.  Personally, I'd like to see the back
of that code...
Peter Rosin Dec. 3, 2015, 12:08 p.m. UTC | #3
Russell King wrote:
> On Thu, Dec 03, 2015 at 11:38:20AM +0000, Peter Rosin wrote:
> > Russell King wrote:
> > > On Thu, Dec 03, 2015 at 08:33:13AM +0000, Peter Rosin wrote:
> > > > I wrote:
> > > > > If I enable CONFIG_CPU_SW_DOMAIN_PAN, I sometimes (but not always) get the
> > > > > following (or very similar) on boot.
> > > > 
> > > > I should have said "if I don't disable", as the option is "default y".
> > > > 
> > > > Also, if it survives on boot, below is an example of later trouble (after
> > > > 30+ minutes on this occasion).
> > > 
> > > Please apply this patch so we (might) get a slightly better oops dump,
> > > and then try to reproduce.
> > 
> > Sure thing, but it's still "DAC: 00000051"...
> 
> Thanks, that confirms that something in the uaccess-with-memcpy code
> is clearing the DACR back to 0x51.
> 
> This has come up several times before, and I really can't spot the
> problem in this code, so I've always said to disable the
> uaccess-with-memcpy code.  Personally, I'd like to see the back
> of that code...

Ok, but it's not me doing crazy things if that's what you are implying
(not saying that you are), because the sama5_defconfig has

	CONFIG_UACCESS_WITH_MEMCPY=y

So, something needs to happen or sama5 remains default-broken.

Cheers,
Peter
Russell King - ARM Linux Dec. 3, 2015, 1:37 p.m. UTC | #4
On Thu, Dec 03, 2015 at 12:08:11PM +0000, Peter Rosin wrote:
> Russell King wrote:
> > On Thu, Dec 03, 2015 at 11:38:20AM +0000, Peter Rosin wrote:
> > > Russell King wrote:
> > > > On Thu, Dec 03, 2015 at 08:33:13AM +0000, Peter Rosin wrote:
> > > > > I wrote:
> > > > > > If I enable CONFIG_CPU_SW_DOMAIN_PAN, I sometimes (but not always) get the
> > > > > > following (or very similar) on boot.
> > > > > 
> > > > > I should have said "if I don't disable", as the option is "default y".
> > > > > 
> > > > > Also, if it survives on boot, below is an example of later trouble (after
> > > > > 30+ minutes on this occasion).
> > > > 
> > > > Please apply this patch so we (might) get a slightly better oops dump,
> > > > and then try to reproduce.
> > > 
> > > Sure thing, but it's still "DAC: 00000051"...
> > 
> > Thanks, that confirms that something in the uaccess-with-memcpy code
> > is clearing the DACR back to 0x51.
> > 
> > This has come up several times before, and I really can't spot the
> > problem in this code, so I've always said to disable the
> > uaccess-with-memcpy code.  Personally, I'd like to see the back
> > of that code...
> 
> Ok, but it's not me doing crazy things if that's what you are implying
> (not saying that you are), because the sama5_defconfig has
> 
> 	CONFIG_UACCESS_WITH_MEMCPY=y
> 
> So, something needs to happen or sama5 remains default-broken.

I have no solution for this, other than saying that uaccess-with-memcpy
seems to be (for some unknown reason) incompatible with SW PAN.

Out of the two features, I'd go for SW PAN over uaccess-with-memcpy,
but others will have a different opinion.  The real solution is to
track down what's going on and why, but I don't think anyone has the
motivation to do that.

I've looked into this problem, and I've been unable to identify what's
going on here.  I can see no reason for the DACR to be set to 0x51 in
this code.

The entry path into this code is via __copy_to_user(), which saves and
sets the DACR to 0x55 before calling arm_copy_to_user(), which for
uaccess-with-memcpy() is in arch/arm/lib/uaccess_with_memcpy.c.  That
tail-calls into __copy_to_user_memcpy(), which should run with the
DACR set to 0x55.

The only place that the DACR is changed is inside __put_user(), which
saves the DACR before also setting it to 0x55, and then restores the
old DACR value, which, because the old value was 0x55, will have no
effect.

So, I can see no way that the DACR should ever be 0x51 inside
__copy_to_user_memcpy(), but you are seeing such a scenario.  I've no
idea how you could ever get a value of 0x51 out of the DACR within this
code.
Peter Rosin Dec. 3, 2015, 4:12 p.m. UTC | #5
Russell King wrote:
> On Thu, Dec 03, 2015 at 12:08:11PM +0000, Peter Rosin wrote:
> > Russell King wrote:
> > > On Thu, Dec 03, 2015 at 11:38:20AM +0000, Peter Rosin wrote:
> > > > Russell King wrote:
> > > > > On Thu, Dec 03, 2015 at 08:33:13AM +0000, Peter Rosin wrote:
> > > > > > I wrote:
> > > > > > > If I enable CONFIG_CPU_SW_DOMAIN_PAN, I sometimes (but not always) get the
> > > > > > > following (or very similar) on boot.
> > > > > > 
> > > > > > I should have said "if I don't disable", as the option is "default y".
> > > > > > 
> > > > > > Also, if it survives on boot, below is an example of later trouble (after
> > > > > > 30+ minutes on this occasion).
> > > > > 
> > > > > Please apply this patch so we (might) get a slightly better oops dump,
> > > > > and then try to reproduce.
> > > > 
> > > > Sure thing, but it's still "DAC: 00000051"...
> > > 
> > > Thanks, that confirms that something in the uaccess-with-memcpy code
> > > is clearing the DACR back to 0x51.
> > > 
> > > This has come up several times before, and I really can't spot the
> > > problem in this code, so I've always said to disable the
> > > uaccess-with-memcpy code.  Personally, I'd like to see the back
> > > of that code...
> > 
> > Ok, but it's not me doing crazy things if that's what you are implying
> > (not saying that you are), because the sama5_defconfig has
> > 
> > 	CONFIG_UACCESS_WITH_MEMCPY=y
> > 
> > So, something needs to happen or sama5 remains default-broken.
> 
> I have no solution for this, other than saying that uaccess-with-memcpy
> seems to be (for some unknown reason) incompatible with SW PAN.
> 
> Out of the two features, I'd go for SW PAN over uaccess-with-memcpy,
> but others will have a different opinion.  The real solution is to
> track down what's going on and why, but I don't think anyone has the
> motivation to do that.
> 
> I've looked into this problem, and I've been unable to identify what's
> going on here.  I can see no reason for the DACR to be set to 0x51 in
> this code.
> 
> The entry path into this code is via __copy_to_user(), which saves and
> sets the DACR to 0x55 before calling arm_copy_to_user(), which for
> uaccess-with-memcpy() is in arch/arm/lib/uaccess_with_memcpy.c.  That
> tail-calls into __copy_to_user_memcpy(), which should run with the
> DACR set to 0x55.
> 
> The only place that the DACR is changed is inside __put_user(), which
> saves the DACR before also setting it to 0x55, and then restores the
> old DACR value, which, because the old value was 0x55, will have no
> effect.
> 
> So, I can see no way that the DACR should ever be 0x51 inside
> __copy_to_user_memcpy(), but you are seeing such a scenario.  I've no
> idea how you could ever get a value of 0x51 out of the DACR within this
> code.

Since it seems like a race is at the bottom of the observed problems, I'm
going to look for things that look racy. The things that stand out to me
are:

* uaccess.h:modify_domain() does a read-modify-write on DACR using
  get_domain and set_domain, and I don't see any locking. Is that
  safe? Why?

* uaccess_with_memcpy.c:__copy_to_user() has a mode in which it copies
  "non-atomically" (if faulthandler_disabled() returns 0). If a fault
  happens during __copy_to_user, what prevents some other thread from
  clobbering DACR?

* In uaccess.h:uaccess_save_and_enable(), what prevents a context
  switch between the get_domain and set_domain calls?

Just asking questions, I have no prior experience with this code...

Cheers,
Peter
Russell King - ARM Linux Dec. 3, 2015, 4:41 p.m. UTC | #6
On Thu, Dec 03, 2015 at 04:12:06PM +0000, Peter Rosin wrote:
> Since it seems like a race is at the bottom of the observed problems, I'm
> going to look for things that look racy. The things that stand out to me
> are:
> 
> * uaccess.h:modify_domain() does a read-modify-write on DACR using
>   get_domain and set_domain, and I don't see any locking. Is that
>   safe? Why?

It's safe:
* the DACR is per-CPU
* all exceptions preserve the original DACR value when they return.
  This is done by storing the DACR value at entry onto the stack, along
  with the register set, and restoring it along with the register set
  on exit from exception processing, as if "nothing ever happened".
  This includes if the exception processing caused a switch to another
  thread.

> * uaccess_with_memcpy.c:__copy_to_user() has a mode in which it copies
>   "non-atomically" (if faulthandler_disabled() returns 0). If a fault
>   happens during __copy_to_user, what prevents some other thread from
>   clobbering DACR?

See the second point above.  Moreover, if we sleep in down_read(),
then __switch_to() reads the current DACR value and saves it in the
thread information, and will restore that value when resuming the
thread - even if the thread has been migrated to a different CPU.

> * In uaccess.h:uaccess_save_and_enable(), what prevents a context
>   switch between the get_domain and set_domain calls?

Nothing, but it doesn't matter, because the DACR register is saved
and restored to preserve its value across all exceptions and thread
switches.

I suspect the only way to nail this down is to litter the uaccess
code (virtually every alternate line) with:

	BUG_ON(get_domain() & domain_mask(DOMAIN_USER) ==
	       domain_val(DOMAIN_USER, DOMAIN_NOACCESS));

to narrow down the exact point where the domain register seemingly
gets reset.  Maybe it'll provide some hint.
diff mbox

Patch

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 7a7c4ce..92789ce 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -95,6 +95,22 @@  void __show_regs(struct pt_regs *regs)
 {
 	unsigned long flags;
 	char buf[64];
+#ifndef CONFIG_CPU_V7M
+	unsigned int domain;
+#ifdef CONFIG_CPU_SW_DOMAIN_PAN
+	/*
+	 * Get the domain register for the parent context. In user
+	 * mode, we don't save the DACR, so lets use what it should
+	 * be. For other modes, we place it after the pt_regs struct.
+	 */
+	if (user_mode(regs))
+		domain = DACR_UACCESS_ENABLE;
+	else
+		domain = *(unsigned int *)(regs + 1);
+#else
+	domain = get_domain();
+#endif
+#endif
 
 	show_regs_print_info(KERN_DEFAULT);
 
@@ -123,21 +139,8 @@  void __show_regs(struct pt_regs *regs)
 
 #ifndef CONFIG_CPU_V7M
 	{
-		unsigned int domain = get_domain();
 		const char *segment;
 
-#ifdef CONFIG_CPU_SW_DOMAIN_PAN
-		/*
-		 * Get the domain register for the parent context. In user
-		 * mode, we don't save the DACR, so lets use what it should
-		 * be. For other modes, we place it after the pt_regs struct.
-		 */
-		if (user_mode(regs))
-			domain = DACR_UACCESS_ENABLE;
-		else
-			domain = *(unsigned int *)(regs + 1);
-#endif
-
 		if ((domain & domain_mask(DOMAIN_USER)) ==
 		    domain_val(DOMAIN_USER, DOMAIN_NOACCESS))
 			segment = "none";
@@ -163,11 +166,11 @@  void __show_regs(struct pt_regs *regs)
 		buf[0] = '\0';
 #ifdef CONFIG_CPU_CP15_MMU
 		{
-			unsigned int transbase, dac = get_domain();
+			unsigned int transbase;
 			asm("mrc p15, 0, %0, c2, c0\n\t"
 			    : "=r" (transbase));
 			snprintf(buf, sizeof(buf), "  Table: %08x  DAC: %08x",
-			  	transbase, dac);
+			  	transbase, domain);
 		}
 #endif
 		asm("mrc p15, 0, %0, c1, c0\n" : "=r" (ctrl));