Message ID | 1508837889-16932-3-git-send-email-byungchul.park@lge.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
* Byungchul Park <byungchul.park@lge.com> wrote:
> Johan Hovold reported a performance regression by crossrelease like:
Pplease add Reported-by and Analyzed-by tags - you didn't even Cc: Johan!
Thanks,
Ingo
Cannot pick up this series yet, but I have enhanced the changelog to: =============> Subject: locking/lockdep: Introduce CONFIG_CROSSRELEASE_STACK_TRACE and make it not unwind by default From: Byungchul Park <byungchul.park@lge.com> Date: Tue, 24 Oct 2017 18:38:03 +0900 Johan Hovold reported a heavy performance regression caused by lockdep cross-release: > Boot time (from "Linux version" to login prompt) had in fact doubled > since 4.13 where it took 17 seconds (with my current config) compared to > the 35 seconds I now see with 4.14-rc4. > > I quick bisect pointed to lockdep and specifically the following commit: > > 28a903f63ec0 ("locking/lockdep: Handle non(or multi)-acquisition > of a crosslock") > > which I've verified is the commit which doubled the boot time (compared > to 28a903f63ec0^) (added by lockdep crossrelease series [1]). Currently crossrelease performs unwind on every acquisition, but that is very expensive. This patch makes unwind optional and disables it by default and only records acquire_ip. Full stack traces are sometimes required for full analysis, in which case CROSSRELEASE_STACK_TRACE can be enabled. On my qemu Ubuntu machine (x86_64, 4 cores, 512M), the regression was fixed. We measure boot times with 'perf stat --null --repeat 10 $QEMU', where $QEMU launches a kernel with init=/bin/true: 1. No lockdep enabled: Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs): 2.756558155 seconds time elapsed ( +- 0.09% ) 2. Lockdep enabled: Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs): 2.968710420 seconds time elapsed ( +- 0.12% ) 3. Lockdep enabled + crossrelease enabled: Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs): 3.153839636 seconds time elapsed ( +- 0.31% ) 4. Lockdep enabled + crossrelease enabled + this patch applied: Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs): 2.963669551 seconds time elapsed ( +- 0.11% ) I.e. lockdep-crossrelease performance is now indistinguishable from vanilla lockdep.
On Tue, Oct 24, 2017 at 12:05:16PM +0200, Ingo Molnar wrote: > > * Byungchul Park <byungchul.park@lge.com> wrote: > > > Johan Hovold reported a performance regression by crossrelease like: > > Pplease add Reported-by and Analyzed-by tags - you didn't even Cc: Johan! Excuse me but, I am sure, whom is the issue analyzed by? Me? > Thanks, > > Ingo
* Byungchul Park <byungchul.park@lge.com> wrote: > On Tue, Oct 24, 2017 at 12:05:16PM +0200, Ingo Molnar wrote: > > > > * Byungchul Park <byungchul.park@lge.com> wrote: > > > > > Johan Hovold reported a performance regression by crossrelease like: > > > > Pplease add Reported-by and Analyzed-by tags - you didn't even Cc: Johan! > > Excuse me but, I am sure, whom is the issue analyzed by? Me? Well, Johan tracked it all down for us, Thomas gave the right suggestion to fix the performance regression, so I meant something like: Reported-by: Johan Hovold <johan@kernel.org> Bisected-by: Johan Hovold <johan@kernel.org> Analyzed-by: Thomas Gleixner <tglx@linutronix.de> Thanks, Ingo
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index bfa8e0b..70358b5 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -278,7 +278,11 @@ struct held_lock { }; #ifdef CONFIG_LOCKDEP_CROSSRELEASE +#ifdef CONFIG_CROSSRELEASE_STACK_TRACE #define MAX_XHLOCK_TRACE_ENTRIES 5 +#else +#define MAX_XHLOCK_TRACE_ENTRIES 1 +#endif /* * This is for keeping locks waiting for commit so that true dependencies diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index e36e652..5c2ddf2 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -4863,8 +4863,13 @@ static void add_xhlock(struct held_lock *hlock) xhlock->trace.nr_entries = 0; xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES; xhlock->trace.entries = xhlock->trace_entries; +#ifdef CONFIG_CROSSRELEASE_STACK_TRACE xhlock->trace.skip = 3; save_stack_trace(&xhlock->trace); +#else + xhlock->trace.nr_entries = 1; + xhlock->trace.entries[0] = hlock->acquire_ip; +#endif } static inline int same_context_xhlock(struct hist_lock *xhlock) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 3db9167..90ea784 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1225,6 +1225,21 @@ config LOCKDEP_COMPLETIONS A deadlock caused by wait_for_completion() and complete() can be detected by lockdep using crossrelease feature. +config CROSSRELEASE_STACK_TRACE + bool "Record more than one entity of stack trace in crossrelease" + depends on LOCKDEP_CROSSRELEASE + default n + help + The lockdep "cross-release" feature needs to record stack traces + (of calling functions) for all acquisitions, for eventual later + use during analysis. By default only a single caller is recorded, + because the unwind operation can be very expensive with deeper + stack chains. However, sometimes deeper traces are required for + full analysis. This option turns on the saving of the full stack + trace entries. + + If unsure, say N. + config DEBUG_LOCKDEP bool "Lock dependency engine debugging" depends on DEBUG_KERNEL && LOCKDEP