Message ID | 1543599271-14339-1-git-send-email-will.deacon@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | arm64: Only call into preempt_schedule() if need_resched() | expand |
On Fri, Nov 30, 2018 at 05:34:29PM +0000, Will Deacon wrote: > Hi all, > > This is version two of the patches I originally posted here: > > http://lkml.kernel.org/r/1543347902-21170-1-git-send-email-will.deacon@arm.com > > The only change since v1 is that __preempt_count_dec_and_test() now > reloads the need_resched flag if it initially saw that it was set. This > resolves the issue spotted by Peter, where an IRQ coming in during the > decrement can cause a reschedule to be missed. Yes, I think this one will work, so: Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> However, this leaves me wondering if the sequence is actually much better than what you had? I suppose there's a win due to cache locality -- you only have to load a single line -- but I'm thinking that on pure instruction count, you're not actually winning much.
Hi Peter, On Thu, Dec 06, 2018 at 04:08:50PM +0100, Peter Zijlstra wrote: > On Fri, Nov 30, 2018 at 05:34:29PM +0000, Will Deacon wrote: > > This is version two of the patches I originally posted here: > > > > http://lkml.kernel.org/r/1543347902-21170-1-git-send-email-will.deacon@arm.com > > > > The only change since v1 is that __preempt_count_dec_and_test() now > > reloads the need_resched flag if it initially saw that it was set. This > > resolves the issue spotted by Peter, where an IRQ coming in during the > > decrement can cause a reschedule to be missed. > > Yes, I think this one will work, so: > > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Thanks! > However, this leaves me wondering if the sequence is actually much > better than what you had? > > I suppose there's a win due to cache locality -- you only have to load a > single line -- but I'm thinking that on pure instruction count, you're > not actually winning much. The fast path is still slightly shorter in terms of executed instructions, but you're right that the win is likely to be because everything hits in the cache or the store buffer when we're not preempting, so we should run through the code reasonably quickly and avoid the unconditional call to preempt_schedule(). Will --->8 // Before 20: a9bf7bfd stp x29, x30, [sp, #-16]! 24: 910003fd mov x29, sp 28: d5384101 mrs x1, sp_el0 2c: b9401020 ldr w0, [x1, #16] 30: 51000400 sub w0, w0, #0x1 34: b9001020 str w0, [x1, #16] 38: 350000a0 cbnz w0, 4c <preempt_enable+0x2c> 3c: f9400020 ldr x0, [x1] 40: 721f001f tst w0, #0x2 44: 54000040 b.eq 4c <preempt_enable+0x2c> // b.none 48: 94000000 bl 0 <preempt_schedule> 4c: a8c17bfd ldp x29, x30, [sp], #16 50: d65f03c0 ret // After 20: a9bf7bfd stp x29, x30, [sp, #-16]! 24: 910003fd mov x29, sp 28: d5384101 mrs x1, sp_el0 2c: f9400820 ldr x0, [x1, #16] 30: d1000400 sub x0, x0, #0x1 34: b9001020 str w0, [x1, #16] 38: b5000080 cbnz x0, 48 <preempt_enable+0x28> 3c: 94000000 bl 0 <preempt_schedule> 40: a8c17bfd ldp x29, x30, [sp], #16 44: d65f03c0 ret 48: f9400820 ldr x0, [x1, #16] 4c: b5ffffa0 cbnz x0, 40 <preempt_enable+0x20> 50: 94000000 bl 0 <preempt_schedule> 54: 17fffffb b 40 <will_preempt_enable+0x20>