Message ID | 517168BB.3070903@dawncrow.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Apr 19, 2013 at 05:54:35PM +0200, André Hentschel wrote: > From: =?UTF-8?q?Andr=C3=A9=20Hentschel?= <nerv@dawncrow.de> > > There are more and more applications coming to WinRT, Wine could support them, > but mostly they expect to have the thread environment block (TEB) in TPIDRURW. > This register must be preserved per thread instead of being cleared. > > Signed-off-by: André Hentschel <nerv@dawncrow.de> This actually makes things less efficient all round, because you now use the value immediately after loading, which means it will cause pipeline stalls, certainly on older CPUs. Could you please rework the patch to try avoiding soo many modifications to the way things have been done here?
On Mon, Apr 22, 2013 at 03:36:16PM +0100, Russell King - ARM Linux wrote: > On Fri, Apr 19, 2013 at 05:54:35PM +0200, André Hentschel wrote: > > From: =?UTF-8?q?Andr=C3=A9=20Hentschel?= <nerv@dawncrow.de> > > > > There are more and more applications coming to WinRT, Wine could support them, > > but mostly they expect to have the thread environment block (TEB) in TPIDRURW. > > This register must be preserved per thread instead of being cleared. > > > > Signed-off-by: André Hentschel <nerv@dawncrow.de> > > This actually makes things less efficient all round, because you > now use the value immediately after loading, which means it will cause > pipeline stalls, certainly on older CPUs. > > Could you please rework the patch to try avoiding soo many modifications > to the way things have been done here? copy_thread also needs updating so that the *register* value for the parent is copied to the child, since the parent may have written the register after the last context-switch, meaning that tp_value is out-of-date. Will
Am 22.04.2013 17:18, schrieb Will Deacon: > On Mon, Apr 22, 2013 at 03:36:16PM +0100, Russell King - ARM Linux wrote: >> On Fri, Apr 19, 2013 at 05:54:35PM +0200, André Hentschel wrote: >>> From: =?UTF-8?q?Andr=C3=A9=20Hentschel?= <nerv@dawncrow.de> >>> >>> There are more and more applications coming to WinRT, Wine could support them, >>> but mostly they expect to have the thread environment block (TEB) in TPIDRURW. >>> This register must be preserved per thread instead of being cleared. >>> >>> Signed-off-by: André Hentschel <nerv@dawncrow.de> >> >> This actually makes things less efficient all round, because you >> now use the value immediately after loading, which means it will cause >> pipeline stalls, certainly on older CPUs. >> >> Could you please rework the patch to try avoiding soo many modifications >> to the way things have been done here? > > copy_thread also needs updating so that the *register* value for the parent > is copied to the child, since the parent may have written the register > after the last context-switch, meaning that tp_value is out-of-date. Thank you both for reviewing. I guess you mostly mean "ldr r6, [r2, #TI_CPU_DOMAIN]". I just thought about old CPUs and remembered again that we at Wine need that patch only on v7 (and later). So is it ok to introduce a set_tls_v7 in tls.h and make use of CONFIG_CPU_V7 compile-time check in the changed files and in the copy_thread function? Do i need any further flag checks in copy_thread or can i use the compile-time check to add unconditional code?
On Mon, Apr 22, 2013 at 10:07:35PM +0100, André Hentschel wrote: > Am 22.04.2013 17:18, schrieb Will Deacon: > > On Mon, Apr 22, 2013 at 03:36:16PM +0100, Russell King - ARM Linux wrote: > >> On Fri, Apr 19, 2013 at 05:54:35PM +0200, André Hentschel wrote: > >>> From: =?UTF-8?q?Andr=C3=A9=20Hentschel?= <nerv@dawncrow.de> > >>> > >>> There are more and more applications coming to WinRT, Wine could support them, > >>> but mostly they expect to have the thread environment block (TEB) in TPIDRURW. > >>> This register must be preserved per thread instead of being cleared. > >>> > >>> Signed-off-by: André Hentschel <nerv@dawncrow.de> > >> > >> This actually makes things less efficient all round, because you > >> now use the value immediately after loading, which means it will cause > >> pipeline stalls, certainly on older CPUs. > >> > >> Could you please rework the patch to try avoiding soo many modifications > >> to the way things have been done here? > > > > copy_thread also needs updating so that the *register* value for the parent > > is copied to the child, since the parent may have written the register > > after the last context-switch, meaning that tp_value is out-of-date. > > Thank you both for reviewing. > > I guess you mostly mean "ldr r6, [r2, #TI_CPU_DOMAIN]". > I just thought about old CPUs and remembered again that we at Wine > need that patch only on v7 (and later). So is it ok to introduce a set_tls_v7 > in tls.h and make use of CONFIG_CPU_V7 compile-time check in > the changed files and in the copy_thread function? No, we should support this feature on any CPU with the TPIDRURW register, otherwise it's going to get really confusing for userspace. > Do i need any further flag checks in copy_thread or can i use the > compile-time check to add unconditional code? You could introduce `get' tls functions, which don't do anything for CPUs without the relevant registers. Will
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h index cddda1f..bb5b48d 100644 --- a/arch/arm/include/asm/thread_info.h +++ b/arch/arm/include/asm/thread_info.h @@ -58,7 +58,7 @@ struct thread_info { struct cpu_context_save cpu_context; /* cpu context */ __u32 syscall; /* syscall number */ __u8 used_cp[16]; /* thread used copro */ - unsigned long tp_value; + unsigned long tp_value[2]; #ifdef CONFIG_CRUNCH struct crunch_state crunchstate; #endif diff --git a/arch/arm/include/asm/tls.h b/arch/arm/include/asm/tls.h index 73409e6..ea0189e 100644 --- a/arch/arm/include/asm/tls.h +++ b/arch/arm/include/asm/tls.h @@ -2,29 +2,35 @@ #define __ASMARM_TLS_H #ifdef __ASSEMBLY__ - .macro set_tls_none, tp, tmp1, tmp2 + .macro set_tls_none, ntp, ptp, tmp1, tmp2 .endm - .macro set_tls_v6k, tp, tmp1, tmp2 - mcr p15, 0, \tp, c13, c0, 3 @ set TLS register - mov \tmp1, #0 - mcr p15, 0, \tmp1, c13, c0, 2 @ clear user r/w TLS register + .macro set_tls_v6k, ntp, ptp, tmp1, tmp2 + mrc p15, 0, \tmp2, c13, c0, 2 @ get user r/w TLS register + str \tmp2, [\ptp, #4] + ldrd \tmp1, \tmp2, [\ntp] + mcr p15, 0, \tmp1, c13, c0, 3 @ set user r/o TLS register + mcr p15, 0, \tmp2, c13, c0, 2 @ set user r/w TLS register .endm - .macro set_tls_v6, tp, tmp1, tmp2 + .macro set_tls_v6, ntp, ptp, tmp1, tmp2 ldr \tmp1, =elf_hwcap ldr \tmp1, [\tmp1, #0] mov \tmp2, #0xffff0fff tst \tmp1, #HWCAP_TLS @ hardware TLS available? - mcrne p15, 0, \tp, c13, c0, 3 @ yes, set TLS register - movne \tmp1, #0 - mcrne p15, 0, \tmp1, c13, c0, 2 @ clear user r/w TLS register - streq \tp, [\tmp2, #-15] @ set TLS value at 0xffff0ff0 + mrcne p15, 0, \tmp2, c13, c0, 2 @ get user r/w TLS register + strne \tmp2, [\ptp, #4] + ldrdne \tmp1, \tmp2, [\ntp] + ldreq \tmp1, [\ntp] + mcrne p15, 0, \tmp1, c13, c0, 3 @ yes, set user r/o TLS register + mcrne p15, 0, \tmp2, c13, c0, 2 @ set user r/w TLS register + streq \tmp1, [\tmp2, #-15] @ set TLS value at 0xffff0ff0 .endm - .macro set_tls_software, tp, tmp1, tmp2 - mov \tmp1, #0xffff0fff - str \tp, [\tmp1, #-15] @ set TLS value at 0xffff0ff0 + .macro set_tls_software, ntp, ptp, tmp1, tmp2 + ldr \tmp1, [\ntp] + mov \tmp2, #0xffff0fff + str \tmp1, [\tmp2, #-15] @ set TLS value at 0xffff0ff0 .endm #endif diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S index 0f82098..78ce1c6 100644 --- a/arch/arm/kernel/entry-armv.S +++ b/arch/arm/kernel/entry-armv.S @@ -728,21 +728,20 @@ ENTRY(__switch_to) UNWIND(.fnstart ) UNWIND(.cantunwind ) add ip, r1, #TI_CPU_SAVE - ldr r3, [r2, #TI_TP_VALUE] ARM( stmia ip!, {r4 - sl, fp, sp, lr} ) @ Store most regs on stack THUMB( stmia ip!, {r4 - sl, fp} ) @ Store most regs on stack THUMB( str sp, [ip], #4 ) THUMB( str lr, [ip], #4 ) -#ifdef CONFIG_CPU_USE_DOMAINS - ldr r6, [r2, #TI_CPU_DOMAIN] -#endif - set_tls r3, r4, r5 + add r3, r2, #TI_TP_VALUE + add r4, r1, #TI_TP_VALUE + set_tls r3, r4, r6, r7 #if defined(CONFIG_CC_STACKPROTECTOR) && !defined(CONFIG_SMP) ldr r7, [r2, #TI_TASK] ldr r8, =__stack_chk_guard ldr r7, [r7, #TSK_STACK_CANARY] #endif #ifdef CONFIG_CPU_USE_DOMAINS + ldr r6, [r2, #TI_CPU_DOMAIN] mcr p15, 0, r6, c3, c0, 0 @ Set domain register #endif mov r5, r0 diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c index 047d3e4..b3171c4 100644 --- a/arch/arm/kernel/process.c +++ b/arch/arm/kernel/process.c @@ -395,7 +395,7 @@ copy_thread(unsigned long clone_flags, unsigned long stack_start, clear_ptrace_hw_breakpoint(p); if (clone_flags & CLONE_SETTLS) - thread->tp_value = childregs->ARM_r3; + thread->tp_value[0] = childregs->ARM_r3; thread_notify(THREAD_NOTIFY_COPY, thread); diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c index 03deeff..2bc1514 100644 --- a/arch/arm/kernel/ptrace.c +++ b/arch/arm/kernel/ptrace.c @@ -849,7 +849,7 @@ long arch_ptrace(struct task_struct *child, long request, #endif case PTRACE_GET_THREAD_AREA: - ret = put_user(task_thread_info(child)->tp_value, + ret = put_user(task_thread_info(child)->tp_value[0], datap); break; diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 1c08911..f9d6259 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -588,7 +588,7 @@ asmlinkage int arm_syscall(int no, struct pt_regs *regs) return regs->ARM_r0; case NR(set_tls): - thread->tp_value = regs->ARM_r0; + thread->tp_value[0] = regs->ARM_r0; if (tls_emu) return 0; if (has_tls_reg) { @@ -706,7 +706,7 @@ static int get_tp_trap(struct pt_regs *regs, unsigned int instr) int reg = (instr >> 12) & 15; if (reg == 15) return 1; - regs->uregs[reg] = current_thread_info()->tp_value; + regs->uregs[reg] = current_thread_info()->tp_value[0]; regs->ARM_pc += 4; return 0; }