diff mbox

[PATCHv2] arm: Preserve TPIDRURW on context switch

Message ID 517168BB.3070903@dawncrow.de (mailing list archive)
State New, archived
Headers show

Commit Message

André Hentschel April 19, 2013, 3:54 p.m. UTC
From: =?UTF-8?q?Andr=C3=A9=20Hentschel?= <nerv@dawncrow.de>

There are more and more applications coming to WinRT, Wine could support them,
but mostly they expect to have the thread environment block (TEB) in TPIDRURW.
This register must be preserved per thread instead of being cleared.

Signed-off-by: André Hentschel <nerv@dawncrow.de>

---
This patch is against a86d52667d8eda5de39393ce737794403bdce1eb

I could only test it with kernel 3.4.6

 arch/arm/include/asm/thread_info.h |    2 +-
 arch/arm/include/asm/tls.h         |   32 +++++++++++++++++++-------------
 arch/arm/kernel/entry-armv.S       |    9 ++++-----
 arch/arm/kernel/process.c          |    2 +-
 arch/arm/kernel/ptrace.c           |    2 +-
 arch/arm/kernel/traps.c            |    4 ++--
 6 files changed, 28 insertions(+), 23 deletions(-)

Comments

Russell King - ARM Linux April 22, 2013, 2:36 p.m. UTC | #1
On Fri, Apr 19, 2013 at 05:54:35PM +0200, André Hentschel wrote:
> From: =?UTF-8?q?Andr=C3=A9=20Hentschel?= <nerv@dawncrow.de>
> 
> There are more and more applications coming to WinRT, Wine could support them,
> but mostly they expect to have the thread environment block (TEB) in TPIDRURW.
> This register must be preserved per thread instead of being cleared.
> 
> Signed-off-by: André Hentschel <nerv@dawncrow.de>

This actually makes things less efficient all round, because you
now use the value immediately after loading, which means it will cause
pipeline stalls, certainly on older CPUs.

Could you please rework the patch to try avoiding soo many modifications
to the way things have been done here?
Will Deacon April 22, 2013, 3:18 p.m. UTC | #2
On Mon, Apr 22, 2013 at 03:36:16PM +0100, Russell King - ARM Linux wrote:
> On Fri, Apr 19, 2013 at 05:54:35PM +0200, André Hentschel wrote:
> > From: =?UTF-8?q?Andr=C3=A9=20Hentschel?= <nerv@dawncrow.de>
> > 
> > There are more and more applications coming to WinRT, Wine could support them,
> > but mostly they expect to have the thread environment block (TEB) in TPIDRURW.
> > This register must be preserved per thread instead of being cleared.
> > 
> > Signed-off-by: André Hentschel <nerv@dawncrow.de>
> 
> This actually makes things less efficient all round, because you
> now use the value immediately after loading, which means it will cause
> pipeline stalls, certainly on older CPUs.
> 
> Could you please rework the patch to try avoiding soo many modifications
> to the way things have been done here?

copy_thread also needs updating so that the *register* value for the parent
is copied to the child, since the parent may have written the register
after the last context-switch, meaning that tp_value is out-of-date.

Will
André Hentschel April 22, 2013, 9:07 p.m. UTC | #3
Am 22.04.2013 17:18, schrieb Will Deacon:
> On Mon, Apr 22, 2013 at 03:36:16PM +0100, Russell King - ARM Linux wrote:
>> On Fri, Apr 19, 2013 at 05:54:35PM +0200, André Hentschel wrote:
>>> From: =?UTF-8?q?Andr=C3=A9=20Hentschel?= <nerv@dawncrow.de>
>>>
>>> There are more and more applications coming to WinRT, Wine could support them,
>>> but mostly they expect to have the thread environment block (TEB) in TPIDRURW.
>>> This register must be preserved per thread instead of being cleared.
>>>
>>> Signed-off-by: André Hentschel <nerv@dawncrow.de>
>>
>> This actually makes things less efficient all round, because you
>> now use the value immediately after loading, which means it will cause
>> pipeline stalls, certainly on older CPUs.
>>
>> Could you please rework the patch to try avoiding soo many modifications
>> to the way things have been done here?
> 
> copy_thread also needs updating so that the *register* value for the parent
> is copied to the child, since the parent may have written the register
> after the last context-switch, meaning that tp_value is out-of-date.

Thank you both for reviewing.

I guess you mostly mean "ldr	r6, [r2, #TI_CPU_DOMAIN]".
I just thought about old CPUs and remembered again that we at Wine
need that patch only on v7 (and later). So is it ok to introduce a set_tls_v7
in tls.h and make use of CONFIG_CPU_V7 compile-time check in
the changed files and in the copy_thread function?
Do i need any further flag checks in copy_thread or can i use the
compile-time check to add unconditional code?
Will Deacon April 23, 2013, 9:15 a.m. UTC | #4
On Mon, Apr 22, 2013 at 10:07:35PM +0100, André Hentschel wrote:
> Am 22.04.2013 17:18, schrieb Will Deacon:
> > On Mon, Apr 22, 2013 at 03:36:16PM +0100, Russell King - ARM Linux wrote:
> >> On Fri, Apr 19, 2013 at 05:54:35PM +0200, André Hentschel wrote:
> >>> From: =?UTF-8?q?Andr=C3=A9=20Hentschel?= <nerv@dawncrow.de>
> >>>
> >>> There are more and more applications coming to WinRT, Wine could support them,
> >>> but mostly they expect to have the thread environment block (TEB) in TPIDRURW.
> >>> This register must be preserved per thread instead of being cleared.
> >>>
> >>> Signed-off-by: André Hentschel <nerv@dawncrow.de>
> >>
> >> This actually makes things less efficient all round, because you
> >> now use the value immediately after loading, which means it will cause
> >> pipeline stalls, certainly on older CPUs.
> >>
> >> Could you please rework the patch to try avoiding soo many modifications
> >> to the way things have been done here?
> > 
> > copy_thread also needs updating so that the *register* value for the parent
> > is copied to the child, since the parent may have written the register
> > after the last context-switch, meaning that tp_value is out-of-date.
> 
> Thank you both for reviewing.
> 
> I guess you mostly mean "ldr	r6, [r2, #TI_CPU_DOMAIN]".
> I just thought about old CPUs and remembered again that we at Wine
> need that patch only on v7 (and later). So is it ok to introduce a set_tls_v7
> in tls.h and make use of CONFIG_CPU_V7 compile-time check in
> the changed files and in the copy_thread function?

No, we should support this feature on any CPU with the TPIDRURW register,
otherwise it's going to get really confusing for userspace.

> Do i need any further flag checks in copy_thread or can i use the
> compile-time check to add unconditional code?

You could introduce `get' tls functions, which don't do anything for CPUs
without the relevant registers.

Will
diff mbox

Patch

diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index cddda1f..bb5b48d 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -58,7 +58,7 @@  struct thread_info {
 	struct cpu_context_save	cpu_context;	/* cpu context */
 	__u32			syscall;	/* syscall number */
 	__u8			used_cp[16];	/* thread used copro */
-	unsigned long		tp_value;
+	unsigned long		tp_value[2];
 #ifdef CONFIG_CRUNCH
 	struct crunch_state	crunchstate;
 #endif
diff --git a/arch/arm/include/asm/tls.h b/arch/arm/include/asm/tls.h
index 73409e6..ea0189e 100644
--- a/arch/arm/include/asm/tls.h
+++ b/arch/arm/include/asm/tls.h
@@ -2,29 +2,35 @@ 
 #define __ASMARM_TLS_H
 
 #ifdef __ASSEMBLY__
-	.macro set_tls_none, tp, tmp1, tmp2
+	.macro set_tls_none, ntp, ptp, tmp1, tmp2
 	.endm
 
-	.macro set_tls_v6k, tp, tmp1, tmp2
-	mcr	p15, 0, \tp, c13, c0, 3		@ set TLS register
-	mov	\tmp1, #0
-	mcr	p15, 0, \tmp1, c13, c0, 2	@ clear user r/w TLS register
+	.macro set_tls_v6k, ntp, ptp, tmp1, tmp2
+	mrc	p15, 0, \tmp2, c13, c0, 2		@ get user r/w TLS register
+	str	\tmp2, [\ptp, #4]
+	ldrd	\tmp1, \tmp2, [\ntp]
+	mcr	p15, 0, \tmp1, c13, c0, 3	@ set user r/o TLS register
+	mcr	p15, 0, \tmp2, c13, c0, 2	@ set user r/w TLS register
 	.endm
 
-	.macro set_tls_v6, tp, tmp1, tmp2
+	.macro set_tls_v6, ntp, ptp, tmp1, tmp2
 	ldr	\tmp1, =elf_hwcap
 	ldr	\tmp1, [\tmp1, #0]
 	mov	\tmp2, #0xffff0fff
 	tst	\tmp1, #HWCAP_TLS		@ hardware TLS available?
-	mcrne	p15, 0, \tp, c13, c0, 3		@ yes, set TLS register
-	movne	\tmp1, #0
-	mcrne	p15, 0, \tmp1, c13, c0, 2	@ clear user r/w TLS register
-	streq	\tp, [\tmp2, #-15]		@ set TLS value at 0xffff0ff0
+	mrcne	p15, 0, \tmp2, c13, c0, 2		@ get user r/w TLS register
+	strne	\tmp2, [\ptp, #4]
+	ldrdne	\tmp1, \tmp2, [\ntp]
+	ldreq	\tmp1, [\ntp]
+	mcrne	p15, 0, \tmp1, c13, c0, 3	@ yes, set user r/o TLS register
+	mcrne	p15, 0, \tmp2, c13, c0, 2	@ set user r/w TLS register
+	streq	\tmp1, [\tmp2, #-15]		@ set TLS value at 0xffff0ff0
 	.endm
 
-	.macro set_tls_software, tp, tmp1, tmp2
-	mov	\tmp1, #0xffff0fff
-	str	\tp, [\tmp1, #-15]		@ set TLS value at 0xffff0ff0
+	.macro set_tls_software, ntp, ptp, tmp1, tmp2
+	ldr	\tmp1, [\ntp]
+	mov	\tmp2, #0xffff0fff
+	str	\tmp1, [\tmp2, #-15]		@ set TLS value at 0xffff0ff0
 	.endm
 #endif
 
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 0f82098..78ce1c6 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -728,21 +728,20 @@  ENTRY(__switch_to)
  UNWIND(.fnstart	)
  UNWIND(.cantunwind	)
 	add	ip, r1, #TI_CPU_SAVE
-	ldr	r3, [r2, #TI_TP_VALUE]
  ARM(	stmia	ip!, {r4 - sl, fp, sp, lr} )	@ Store most regs on stack
  THUMB(	stmia	ip!, {r4 - sl, fp}	   )	@ Store most regs on stack
  THUMB(	str	sp, [ip], #4		   )
  THUMB(	str	lr, [ip], #4		   )
-#ifdef CONFIG_CPU_USE_DOMAINS
-	ldr	r6, [r2, #TI_CPU_DOMAIN]
-#endif
-	set_tls	r3, r4, r5
+	add	r3, r2, #TI_TP_VALUE
+	add	r4, r1, #TI_TP_VALUE
+	set_tls	r3, r4, r6, r7
 #if defined(CONFIG_CC_STACKPROTECTOR) && !defined(CONFIG_SMP)
 	ldr	r7, [r2, #TI_TASK]
 	ldr	r8, =__stack_chk_guard
 	ldr	r7, [r7, #TSK_STACK_CANARY]
 #endif
 #ifdef CONFIG_CPU_USE_DOMAINS
+	ldr	r6, [r2, #TI_CPU_DOMAIN]
 	mcr	p15, 0, r6, c3, c0, 0		@ Set domain register
 #endif
 	mov	r5, r0
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 047d3e4..b3171c4 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -395,7 +395,7 @@  copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	clear_ptrace_hw_breakpoint(p);
 
 	if (clone_flags & CLONE_SETTLS)
-		thread->tp_value = childregs->ARM_r3;
+		thread->tp_value[0] = childregs->ARM_r3;
 
 	thread_notify(THREAD_NOTIFY_COPY, thread);
 
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 03deeff..2bc1514 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -849,7 +849,7 @@  long arch_ptrace(struct task_struct *child, long request,
 #endif
 
 		case PTRACE_GET_THREAD_AREA:
-			ret = put_user(task_thread_info(child)->tp_value,
+			ret = put_user(task_thread_info(child)->tp_value[0],
 				       datap);
 			break;
 
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 1c08911..f9d6259 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -588,7 +588,7 @@  asmlinkage int arm_syscall(int no, struct pt_regs *regs)
 		return regs->ARM_r0;
 
 	case NR(set_tls):
-		thread->tp_value = regs->ARM_r0;
+		thread->tp_value[0] = regs->ARM_r0;
 		if (tls_emu)
 			return 0;
 		if (has_tls_reg) {
@@ -706,7 +706,7 @@  static int get_tp_trap(struct pt_regs *regs, unsigned int instr)
 	int reg = (instr >> 12) & 15;
 	if (reg == 15)
 		return 1;
-	regs->uregs[reg] = current_thread_info()->tp_value;
+	regs->uregs[reg] = current_thread_info()->tp_value[0];
 	regs->ARM_pc += 4;
 	return 0;
 }