diff mbox

BUG: commit "ARM: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW on pre-ARMv6 CPUs" breaks armv5 with CONFIG_PREEMPT

Message ID 51C2EBCD.4070206@pengutronix.de (mailing list archive)
State New, archived
Headers show

Commit Message

Marc Kleine-Budde June 20, 2013, 11:47 a.m. UTC
On 06/20/2013 01:39 PM, Marc Kleine-Budde wrote:
> On 06/20/2013 01:35 PM, Marc Kleine-Budde wrote:
>> On 06/20/2013 01:12 PM, Catalin Marinas wrote:
>>> On Thu, Jun 20, 2013 at 11:28:56AM +0100, Catalin Marinas wrote:
>>>> We may need to place the preempt disable/enable at a higher level in the
>>>> scheduler. My theory is that we have a context switch from prev to next.
>>>> We get preempted just before finish_arch_post_lock_switch(), so the MMU
>>>> hasn't been switched yet. The new switch during preemption happens to a
>>>> thread with the same next mm, so the scheduler no longer switch_mm() and
>>>> the TIF_SWITCH_MM isn't set for the new thread.
>>>>
>>>> I'll come back with another patch shortly.
>>>
>>> Here's another attempt (as before, only compile-tested):
>>
>> booting kernel from /image
>> zImage: concatenated oftree detected
>> booting Linux kernel with devicetree
>>
>> ...dead...
>>
>> Does every process have a "mm"? Even Kernel threads?

I've added a check for "mm". Boots now and my test runs stable for 3
minutes now.

I'm not sure if we have to check for "mm" in
check_and_switch_context(), too.

Thanks,
Marc
---

From 306d84d5f0645a86e86d539be0546c4ac758d3d4 Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas@arm.com>
Date: Thu, 20 Jun 2013 12:12:55 +0100
Subject: [PATCH] arm: Fix deferred mm switch on VIVT processors

As of commit b9d4d42ad9 (ARM: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW on
pre-ARMv6 CPUs), the mm switching on VIVT processors is done in the
finish_arch_post_lock_switch() function to avoid whole cache flushing
with interrupts disabled. The need for deferred mm switch is stored as a
thread flag (TIF_SWITCH_MM). However, with preemption enabled, we can
have another thread switch before finish_arch_post_lock_switch(). If the
new thread has the same mm as the previous 'next' thread, the scheduler
will not call switch_mm() and the TIF_SWITCH_MM flag won't be set for
the new thread.

This patch moves the switch pending flag to the mm_context_t structure
since this is specific to the mm rather than thread.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
[mkl: add check for mm]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 arch/arm/include/asm/mmu.h         | 2 ++
 arch/arm/include/asm/mmu_context.h | 8 +++++---
 arch/arm/include/asm/thread_info.h | 1 -
 3 files changed, 7 insertions(+), 4 deletions(-)

Comments

Marc Kleine-Budde June 20, 2013, 12:48 p.m. UTC | #1
On 06/20/2013 01:47 PM, Marc Kleine-Budde wrote:
> On 06/20/2013 01:39 PM, Marc Kleine-Budde wrote:
>> On 06/20/2013 01:35 PM, Marc Kleine-Budde wrote:
>>> On 06/20/2013 01:12 PM, Catalin Marinas wrote:
>>>> On Thu, Jun 20, 2013 at 11:28:56AM +0100, Catalin Marinas wrote:
>>>>> We may need to place the preempt disable/enable at a higher level in the
>>>>> scheduler. My theory is that we have a context switch from prev to next.
>>>>> We get preempted just before finish_arch_post_lock_switch(), so the MMU
>>>>> hasn't been switched yet. The new switch during preemption happens to a
>>>>> thread with the same next mm, so the scheduler no longer switch_mm() and
>>>>> the TIF_SWITCH_MM isn't set for the new thread.
>>>>>
>>>>> I'll come back with another patch shortly.
>>>>
>>>> Here's another attempt (as before, only compile-tested):
>>>
>>> booting kernel from /image
>>> zImage: concatenated oftree detected
>>> booting Linux kernel with devicetree
>>>
>>> ...dead...
>>>
>>> Does every process have a "mm"? Even Kernel threads?
> 
> I've added a check for "mm". Boots now and my test runs stable for 3
> minutes now.
> 
> I'm not sure if we have to check for "mm" in
> check_and_switch_context(), too.
> 
> Thanks,
> Marc
> ---
> 
> From 306d84d5f0645a86e86d539be0546c4ac758d3d4 Mon Sep 17 00:00:00 2001
> From: Catalin Marinas <catalin.marinas@arm.com>
> Date: Thu, 20 Jun 2013 12:12:55 +0100
> Subject: [PATCH] arm: Fix deferred mm switch on VIVT processors
> 
> As of commit b9d4d42ad9 (ARM: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW on
> pre-ARMv6 CPUs), the mm switching on VIVT processors is done in the
> finish_arch_post_lock_switch() function to avoid whole cache flushing
> with interrupts disabled. The need for deferred mm switch is stored as a
> thread flag (TIF_SWITCH_MM). However, with preemption enabled, we can
> have another thread switch before finish_arch_post_lock_switch(). If the
> new thread has the same mm as the previous 'next' thread, the scheduler
> will not call switch_mm() and the TIF_SWITCH_MM flag won't be set for
> the new thread.
> 
> This patch moves the switch pending flag to the mm_context_t structure
> since this is specific to the mm rather than thread.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
> [mkl: add check for mm]
> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

The test program works for 1 hour without problems. I'm going to push
the patch to our customer, if it fixes the problem there,  I'll add my
Tested-by.

regards,
Marc
Catalin Marinas June 20, 2013, 1:01 p.m. UTC | #2
On Thu, Jun 20, 2013 at 12:47:25PM +0100, Marc Kleine-Budde wrote:
> On 06/20/2013 01:39 PM, Marc Kleine-Budde wrote:
> > On 06/20/2013 01:35 PM, Marc Kleine-Budde wrote:
> >> On 06/20/2013 01:12 PM, Catalin Marinas wrote:
> >>> On Thu, Jun 20, 2013 at 11:28:56AM +0100, Catalin Marinas wrote:
> >>>> We may need to place the preempt disable/enable at a higher level in the
> >>>> scheduler. My theory is that we have a context switch from prev to next.
> >>>> We get preempted just before finish_arch_post_lock_switch(), so the MMU
> >>>> hasn't been switched yet. The new switch during preemption happens to a
> >>>> thread with the same next mm, so the scheduler no longer switch_mm() and
> >>>> the TIF_SWITCH_MM isn't set for the new thread.
> >>>>
> >>>> I'll come back with another patch shortly.
> >>>
> >>> Here's another attempt (as before, only compile-tested):
> >>
> >> booting kernel from /image
> >> zImage: concatenated oftree detected
> >> booting Linux kernel with devicetree
> >>
> >> ...dead...
> >>
> >> Does every process have a "mm"? Even Kernel threads?
> 
> I've added a check for "mm". Boots now and my test runs stable for 3
> minutes now.

Ah, good point.

> I'm not sure if we have to check for "mm" in
> check_and_switch_context(), too.

switch_mm() wouldn't be called with a NULL mm, hence we wouldn't call
check_and_switch_context() either. finish_arch_post_lock_switch() is
called all the time (and we set the TIF flag only if switch_mm() was
called).

Thanks.
Marc Kleine-Budde June 20, 2013, 1:05 p.m. UTC | #3
On 06/20/2013 03:01 PM, Catalin Marinas wrote:
> On Thu, Jun 20, 2013 at 12:47:25PM +0100, Marc Kleine-Budde wrote:
>> On 06/20/2013 01:39 PM, Marc Kleine-Budde wrote:
>>> On 06/20/2013 01:35 PM, Marc Kleine-Budde wrote:
>>>> On 06/20/2013 01:12 PM, Catalin Marinas wrote:
>>>>> On Thu, Jun 20, 2013 at 11:28:56AM +0100, Catalin Marinas wrote:
>>>>>> We may need to place the preempt disable/enable at a higher level in the
>>>>>> scheduler. My theory is that we have a context switch from prev to next.
>>>>>> We get preempted just before finish_arch_post_lock_switch(), so the MMU
>>>>>> hasn't been switched yet. The new switch during preemption happens to a
>>>>>> thread with the same next mm, so the scheduler no longer switch_mm() and
>>>>>> the TIF_SWITCH_MM isn't set for the new thread.
>>>>>>
>>>>>> I'll come back with another patch shortly.
>>>>>
>>>>> Here's another attempt (as before, only compile-tested):
>>>>
>>>> booting kernel from /image
>>>> zImage: concatenated oftree detected
>>>> booting Linux kernel with devicetree
>>>>
>>>> ...dead...
>>>>
>>>> Does every process have a "mm"? Even Kernel threads?
>>
>> I've added a check for "mm". Boots now and my test runs stable for 3
>> minutes now.
> 
> Ah, good point.
> 
>> I'm not sure if we have to check for "mm" in
>> check_and_switch_context(), too.
> 
> switch_mm() wouldn't be called with a NULL mm, hence we wouldn't call
> check_and_switch_context() either. finish_arch_post_lock_switch() is
> called all the time (and we set the TIF flag only if switch_mm() was
> called).

Thanks for the explanation. $CUSTOMER's mx28 runs stable with the
minimal test program, I'll keep you informed if they have tested with
the full blown program.

Marc
Marc Kleine-Budde June 21, 2013, 10:28 a.m. UTC | #4
On 06/20/2013 01:47 PM, Marc Kleine-Budde wrote:
> On 06/20/2013 01:39 PM, Marc Kleine-Budde wrote:
>> On 06/20/2013 01:35 PM, Marc Kleine-Budde wrote:
>>> On 06/20/2013 01:12 PM, Catalin Marinas wrote:
>>>> On Thu, Jun 20, 2013 at 11:28:56AM +0100, Catalin Marinas wrote:
>>>>> We may need to place the preempt disable/enable at a higher level in the
>>>>> scheduler. My theory is that we have a context switch from prev to next.
>>>>> We get preempted just before finish_arch_post_lock_switch(), so the MMU
>>>>> hasn't been switched yet. The new switch during preemption happens to a
>>>>> thread with the same next mm, so the scheduler no longer switch_mm() and
>>>>> the TIF_SWITCH_MM isn't set for the new thread.
>>>>>
>>>>> I'll come back with another patch shortly.
>>>>
>>>> Here's another attempt (as before, only compile-tested):
>>>
>>> booting kernel from /image
>>> zImage: concatenated oftree detected
>>> booting Linux kernel with devicetree
>>>
>>> ...dead...
>>>
>>> Does every process have a "mm"? Even Kernel threads?
> 
> I've added a check for "mm". Boots now and my test runs stable for 3
> minutes now.
> 
> I'm not sure if we have to check for "mm" in
> check_and_switch_context(), too.
> 
> Thanks,
> Marc
> ---
> 
> From 306d84d5f0645a86e86d539be0546c4ac758d3d4 Mon Sep 17 00:00:00 2001
> From: Catalin Marinas <catalin.marinas@arm.com>
> Date: Thu, 20 Jun 2013 12:12:55 +0100
> Subject: [PATCH] arm: Fix deferred mm switch on VIVT processors
> 
> As of commit b9d4d42ad9 (ARM: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW on
> pre-ARMv6 CPUs), the mm switching on VIVT processors is done in the
> finish_arch_post_lock_switch() function to avoid whole cache flushing
> with interrupts disabled. The need for deferred mm switch is stored as a
> thread flag (TIF_SWITCH_MM). However, with preemption enabled, we can
> have another thread switch before finish_arch_post_lock_switch(). If the
> new thread has the same mm as the previous 'next' thread, the scheduler
> will not call switch_mm() and the TIF_SWITCH_MM flag won't be set for
> the new thread.
> 
> This patch moves the switch pending flag to the mm_context_t structure
> since this is specific to the mm rather than thread.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
> [mkl: add check for mm]
> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Works on $CUSTOMERS's hardware.

Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>

Catalin, do you take care of the patch? Please add stable on Cc. The fix
is needed on kernels >= v3.5.

regards,
Marc
Catalin Marinas June 21, 2013, 1:52 p.m. UTC | #5
On Fri, Jun 21, 2013 at 11:28:48AM +0100, Marc Kleine-Budde wrote:
> On 06/20/2013 01:47 PM, Marc Kleine-Budde wrote:
> > From 306d84d5f0645a86e86d539be0546c4ac758d3d4 Mon Sep 17 00:00:00 2001
> > From: Catalin Marinas <catalin.marinas@arm.com>
> > Date: Thu, 20 Jun 2013 12:12:55 +0100
> > Subject: [PATCH] arm: Fix deferred mm switch on VIVT processors
> > 
> > As of commit b9d4d42ad9 (ARM: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW on
> > pre-ARMv6 CPUs), the mm switching on VIVT processors is done in the
> > finish_arch_post_lock_switch() function to avoid whole cache flushing
> > with interrupts disabled. The need for deferred mm switch is stored as a
> > thread flag (TIF_SWITCH_MM). However, with preemption enabled, we can
> > have another thread switch before finish_arch_post_lock_switch(). If the
> > new thread has the same mm as the previous 'next' thread, the scheduler
> > will not call switch_mm() and the TIF_SWITCH_MM flag won't be set for
> > the new thread.
> > 
> > This patch moves the switch pending flag to the mm_context_t structure
> > since this is specific to the mm rather than thread.
> > 
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
> > [mkl: add check for mm]
> > Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
> 
> Works on $CUSTOMERS's hardware.
> 
> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>

Thanks.

> Catalin, do you take care of the patch? Please add stable on Cc. The fix
> is needed on kernels >= v3.5.

I'll send it to Russell, if possible it should go in for 3.10. I already
added CC stable in my copy (3.5+).
Marc Kleine-Budde July 17, 2013, 8:41 a.m. UTC | #6
On 06/21/2013 03:52 PM, Catalin Marinas wrote:
> On Fri, Jun 21, 2013 at 11:28:48AM +0100, Marc Kleine-Budde wrote:
>> On 06/20/2013 01:47 PM, Marc Kleine-Budde wrote:
>>> From 306d84d5f0645a86e86d539be0546c4ac758d3d4 Mon Sep 17 00:00:00 2001
>>> From: Catalin Marinas <catalin.marinas@arm.com>
>>> Date: Thu, 20 Jun 2013 12:12:55 +0100
>>> Subject: [PATCH] arm: Fix deferred mm switch on VIVT processors
>>>
>>> As of commit b9d4d42ad9 (ARM: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW on
>>> pre-ARMv6 CPUs), the mm switching on VIVT processors is done in the
>>> finish_arch_post_lock_switch() function to avoid whole cache flushing
>>> with interrupts disabled. The need for deferred mm switch is stored as a
>>> thread flag (TIF_SWITCH_MM). However, with preemption enabled, we can
>>> have another thread switch before finish_arch_post_lock_switch(). If the
>>> new thread has the same mm as the previous 'next' thread, the scheduler
>>> will not call switch_mm() and the TIF_SWITCH_MM flag won't be set for
>>> the new thread.
>>>
>>> This patch moves the switch pending flag to the mm_context_t structure
>>> since this is specific to the mm rather than thread.
>>>
>>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>>> Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
>>> [mkl: add check for mm]
>>> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
>>
>> Works on $CUSTOMERS's hardware.
>>
>> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
> 
> Thanks.
> 
>> Catalin, do you take care of the patch? Please add stable on Cc. The fix
>> is needed on kernels >= v3.5.
> 
> I'll send it to Russell, if possible it should go in for 3.10. I already
> added CC stable in my copy (3.5+).

What happend to that patch? I cannot find it in v3.10.x nor in linus/master.

Marc
Russell King - ARM Linux July 17, 2013, 8:51 a.m. UTC | #7
On Wed, Jul 17, 2013 at 10:41:19AM +0200, Marc Kleine-Budde wrote:
> What happend to that patch? I cannot find it in v3.10.x nor in linus/master.

Catalin sent it to me, and I didn't apply it because it's still racy.

We have stateful context switches (due to cache clearing) and this does
nothing to solve the problem there.  If we get preempted during the
cache clearing, things will still go wrong.

See:

http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7771/1

I've no idea what's happening about this.  It looks to me like nothing
happened after Catalin's last comment there.
Marc Kleine-Budde July 17, 2013, 11:48 a.m. UTC | #8
On 07/17/2013 10:51 AM, Russell King - ARM Linux wrote:
> On Wed, Jul 17, 2013 at 10:41:19AM +0200, Marc Kleine-Budde wrote:
>> What happend to that patch? I cannot find it in v3.10.x nor in linus/master.
> 
> Catalin sent it to me, and I didn't apply it because it's still racy.
> 
> We have stateful context switches (due to cache clearing) and this does
> nothing to solve the problem there.  If we get preempted during the
> cache clearing, things will still go wrong.

Thanks for the update.

> See:
> 
> http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7771/1
> 
> I've no idea what's happening about this.  It looks to me like nothing
> happened after Catalin's last comment there.

Catalin, what's next?

Marc
Catalin Marinas July 17, 2013, 7:41 p.m. UTC | #9
On Wed, Jul 17, 2013 at 09:51:39AM +0100, Russell King - ARM Linux wrote:
> On Wed, Jul 17, 2013 at 10:41:19AM +0200, Marc Kleine-Budde wrote:
> > What happend to that patch? I cannot find it in v3.10.x nor in linus/master.
> 
> Catalin sent it to me, and I didn't apply it because it's still racy.
> 
> We have stateful context switches (due to cache clearing) and this does
> nothing to solve the problem there.  If we get preempted during the
> cache clearing, things will still go wrong.
> 
> See:
> 
> http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7771/1
> 
> I've no idea what's happening about this.  It looks to me like nothing
> happened after Catalin's last comment there.

Too many other things to do. It's on my list for this week or early next
week.
diff mbox

Patch

diff --git a/arch/arm/include/asm/mmu.h b/arch/arm/include/asm/mmu.h
index e3d5554..d1b4998 100644
--- a/arch/arm/include/asm/mmu.h
+++ b/arch/arm/include/asm/mmu.h
@@ -6,6 +6,8 @@ 
 typedef struct {
 #ifdef CONFIG_CPU_HAS_ASID
 	atomic64_t	id;
+#else
+	int		switch_pending;
 #endif
 	unsigned int	vmalloc_seq;
 } mm_context_t;
diff --git a/arch/arm/include/asm/mmu_context.h b/arch/arm/include/asm/mmu_context.h
index a7b85e0..9503a7b 100644
--- a/arch/arm/include/asm/mmu_context.h
+++ b/arch/arm/include/asm/mmu_context.h
@@ -47,7 +47,7 @@  static inline void check_and_switch_context(struct mm_struct *mm,
 		 * on non-ASID CPUs, the old mm will remain valid until the
 		 * finish_arch_post_lock_switch() call.
 		 */
-		set_ti_thread_flag(task_thread_info(tsk), TIF_SWITCH_MM);
+		mm->context.switch_pending = 1;
 	else
 		cpu_switch_mm(mm->pgd, mm);
 }
@@ -56,9 +56,11 @@  static inline void check_and_switch_context(struct mm_struct *mm,
 	finish_arch_post_lock_switch
 static inline void finish_arch_post_lock_switch(void)
 {
-	if (test_and_clear_thread_flag(TIF_SWITCH_MM)) {
-		struct mm_struct *mm = current->mm;
+	struct mm_struct *mm = current->mm;
+
+	if (mm && mm->context.switch_pending) {
 		cpu_switch_mm(mm->pgd, mm);
+		mm->context.switch_pending = 0;
 	}
 }
 
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index 1995d1a..f00b569 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -156,7 +156,6 @@  extern int vfp_restore_user_hwstate(struct user_vfp __user *,
 #define TIF_USING_IWMMXT	17
 #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
 #define TIF_RESTORE_SIGMASK	20
-#define TIF_SWITCH_MM		22	/* deferred switch_mm */
 
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)