From patchwork Tue Jan 29 07:25:26 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zefan Li X-Patchwork-Id: 2059941 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by patchwork1.kernel.org (Postfix) with ESMTP id 29BE13FDD1 for ; Tue, 29 Jan 2013 07:28:37 +0000 (UTC) Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1U05Zy-00080l-Gj; Tue, 29 Jan 2013 07:26:02 +0000 Received: from szxga02-in.huawei.com ([119.145.14.65]) by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1U05Zt-0007rl-11 for linux-arm-kernel@lists.infradead.org; Tue, 29 Jan 2013 07:26:00 +0000 Received: from 172.24.2.119 (EHLO szxeml210-edg.china.huawei.com) ([172.24.2.119]) by szxrg02-dlp.huawei.com (MOS 4.3.4-GA FastPath queued) with ESMTP id AWI68461; Tue, 29 Jan 2013 15:25:30 +0800 (CST) Received: from SZXEML419-HUB.china.huawei.com (10.82.67.158) by szxeml210-edg.china.huawei.com (172.24.2.183) with Microsoft SMTP Server (TLS) id 14.1.323.7; Tue, 29 Jan 2013 15:25:28 +0800 Received: from [10.135.68.215] (10.135.68.215) by smtpscn.huawei.com (10.82.67.158) with Microsoft SMTP Server (TLS) id 14.1.323.7; Tue, 29 Jan 2013 15:25:27 +0800 Message-ID: <51077966.1060703@huawei.com> Date: Tue, 29 Jan 2013 15:25:26 +0800 From: Li Zefan User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: Catalin Marinas Subject: backport patches to 2.6.34 to remove __ARCH_WANT_INTERRUPTS_ON_CTXSW? X-Originating-IP: [10.135.68.215] X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20130129_022558_459446_2A68FF29 X-CRM114-Status: GOOD ( 31.87 ) X-Spam-Score: -2.6 (--) X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary: Content analysis details: (-2.6 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record -0.7 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Cc: Will Deacon , Chanho Min , linux-arm-kernel@lists.infradead.org X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Hi Catalin, We got system crashes, and then we managed to trigger the bug within minutes, and we found this in upstream, which also backported to 2.6.34 stable: commit cb297a3e433dbdcf7ad81e0564e7b804c941ff0d Author: Chanho Min Date: Thu Jan 5 20:00:19 2012 +0900 sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW The bug described in this commit resembles to ours. Unfortunately After applying the fix, we still get crash in hours. We tried to bind each real-time task to a single cpu to make sure no cpu migration will happen, and it ran without any problem for ~20 hours. We're still investigating this issue. One thing I'm doing is backporting patches that removes __ARCH_WANT_INTERRUPTS_ON_CTXSW. With those patches, I can boot the kernel, but it hung up when the system automatically start nfs and later soft-lockup was reported. Things are fine if I disable nfs startup and start it manually. So did I miss something when backporting, or is it infeasible to backport them to 2.6.34? We're using ARMv7. I've attached the patches I backported. Thanks, Li Zefan From d190d1262e3403d2f80bff5bd310b287a0ed9edb Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Mon, 28 Jan 2013 16:24:11 +0800 Subject: [PATCH 3/3] ARM: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW on ASID-capable CPUs upstream 7fec1b57b8a925d83c194f995f83d9f8442fd48e commit. Since the ASIDs must be unique to an mm across all the CPUs in a system, the __new_context() function needs to broadcast a context reset event to all the CPUs during ASID allocation if a roll-over occurred. Such IPIs cannot be issued with interrupts disabled and ARM had to define __ARCH_WANT_INTERRUPTS_ON_CTXSW. This patch changes the check_context() function to check_and_switch_context() called from switch_mm(). In case of ASID-capable CPUs (ARMv6 onwards), if a new ASID is needed and the interrupts are disabled, it defers the __new_context() and cpu_switch_mm() calls to the post-lock switch hook where the interrupts are enabled. Setting the reserved TTBR0 was also moved to check_and_switch_context() from cpu_v7_switch_mm(). Reviewed-by: Will Deacon Tested-by: Will Deacon Reviewed-by: Frank Rowand Tested-by: Marc Zyngier Signed-off-by: Catalin Marinas --- arch/arm/include/asm/mmu_context.h | 72 ++++++++++++++++++++++++++++-------- arch/arm/include/asm/system.h | 2 + arch/arm/include/asm/thread_info.h | 1 + arch/arm/mm/context.c | 2 +- arch/arm/mm/proc-v7.S | 3 - 5 files changed, 60 insertions(+), 20 deletions(-) diff --git a/arch/arm/include/asm/mmu_context.h b/arch/arm/include/asm/mmu_context.h index a0b3cac..9e87627 100644 --- a/arch/arm/include/asm/mmu_context.h +++ b/arch/arm/include/asm/mmu_context.h @@ -49,39 +49,80 @@ DECLARE_PER_CPU(struct mm_struct *, current_mm); void __init_new_context(struct task_struct *tsk, struct mm_struct *mm); void __new_context(struct mm_struct *mm); +void cpu_set_reserved_ttbr0(void); -static inline void check_context(struct mm_struct *mm) +static inline void switch_new_context(struct mm_struct *mm) { - /* - * This code is executed with interrupts enabled. Therefore, - * mm->context.id cannot be updated to the latest ASID version - * on a different CPU (and condition below not triggered) - * without first getting an IPI to reset the context. The - * alternative is to take a read_lock on mm->context.id_lock - * (after changing its type to rwlock_t). - */ - if (unlikely((mm->context.id ^ cpu_last_asid) >> ASID_BITS)) - __new_context(mm); + unsigned long flags; + __new_context(mm); + + local_irq_save(flags); + cpu_switch_mm(mm->pgd, mm); + local_irq_restore(flags); +} + +static inline void check_and_switch_context(struct mm_struct *mm, + struct task_struct *tsk) +{ if (unlikely(mm->context.kvm_seq != init_mm.context.kvm_seq)) __check_kvm_seq(mm); + + /* + * Required during context switch to avoid speculative page table + * walking with the wrong TTBR. + */ + cpu_set_reserved_ttbr0(); + + if (!((mm->context.id ^ cpu_last_asid) >> ASID_BITS)) + /* + * The ASID is from the current generation, just switch to the + * new pgd. This condition is only true for calls from + * context_switch() and interrupts are already disabled. + */ + cpu_switch_mm(mm->pgd, mm); + else if (irqs_disabled()) + /* + * Defer the new ASID allocation until after the context + * switch critical region since __new_context() cannot be + * called with interrupts disabled (it sends IPIs). + */ + set_ti_thread_flag(task_thread_info(tsk), TIF_SWITCH_MM); + else + /* + * That is a direct call to switch_mm() or activate_mm() with + * interrupts enabled and a new context. + */ + switch_new_context(mm); } #define init_new_context(tsk,mm) (__init_new_context(tsk,mm),0) -#else +#define finish_arch_post_lock_switch \ + finish_arch_post_lock_switch +static inline void finish_arch_post_lock_switch(void) +{ + if (test_and_clear_thread_flag(TIF_SWITCH_MM)) + switch_new_context(current->mm); +} -static inline void check_context(struct mm_struct *mm) +#else /* !CONFIG_CPU_HAS_ASID */ + +static inline void check_and_switch_context(struct mm_struct *mm, + struct task_struct *tsk) { #ifdef CONFIG_MMU if (unlikely(mm->context.kvm_seq != init_mm.context.kvm_seq)) __check_kvm_seq(mm); + cpu_switch_mm(mm->pgd, mm); #endif } #define init_new_context(tsk,mm) 0 -#endif +#define finish_arch_post_lock_switch() do { } while (0) + +#endif /* CONFIG_CPU_HAS_ASID */ #define destroy_context(mm) do { } while(0) @@ -123,8 +164,7 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next, struct mm_struct **crt_mm = &per_cpu(current_mm, cpu); *crt_mm = next; #endif - check_context(next); - cpu_switch_mm(next->pgd, next); + check_and_switch_context(next, tsk); if (cache_is_vivt()) cpumask_clear_cpu(cpu, mm_cpumask(prev)); } diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h index 4fa6b9b..8b2c3a2 100644 --- a/arch/arm/include/asm/system.h +++ b/arch/arm/include/asm/system.h @@ -240,7 +240,9 @@ static inline void set_copro_access(unsigned int val) * so enable interrupts over the context switch to avoid high * latency. */ +#ifndef CONFIG_CPU_HAS_ASID #define __ARCH_WANT_INTERRUPTS_ON_CTXSW +#endif /* * switch_to(prev, next) should switch from task `prev' to `next' diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h index 7fe8224..b091241 100644 --- a/arch/arm/include/asm/thread_info.h +++ b/arch/arm/include/asm/thread_info.h @@ -147,6 +147,7 @@ extern void vfp_flush_hwstate(struct thread_info *); #define TIF_MEMDIE 18 #define TIF_FREEZE 19 #define TIF_RESTORE_SIGMASK 20 +#define TIF_SWITCH_MM 21 #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c index 0d057ea..4082a69 100644 --- a/arch/arm/mm/context.c +++ b/arch/arm/mm/context.c @@ -22,7 +22,7 @@ unsigned int cpu_last_asid = ASID_FIRST_VERSION; DEFINE_PER_CPU(struct mm_struct *, current_mm); #endif -static void cpu_set_reserved_ttbr0(void) +void cpu_set_reserved_ttbr0(void) { u32 ttb; /* Copy TTBR1 into TTBR0 */ diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S index 6cd2014..43b1cae 100644 --- a/arch/arm/mm/proc-v7.S +++ b/arch/arm/mm/proc-v7.S @@ -112,9 +112,6 @@ ENTRY(cpu_v7_switch_mm) #ifdef CONFIG_ARM_ERRATA_430973 mcr p15, 0, r2, c7, c5, 6 @ flush BTAC/BTB #endif - mrc p15, 0, r2, c2, c0, 1 @ load TTB 1 - mcr p15, 0, r2, c2, c0, 0 @ into TTB 0 - isb mcr p15, 0, r1, c13, c0, 1 @ set context ID isb mcr p15, 0, r0, c2, c0, 0 @ set TTB 0 -- 1.7.6