Message ID | 20190418084119.056416939@linutronix.de (mailing list archive) |
---|---|
Headers | show |
Series | stacktrace: Consolidate stack trace usage | expand |
On Thu, 18 Apr 2019 14:58:55 -0500 Tom Zanussi <tom.zanussi@linux.intel.com> wrote: > > Tom, > > > > Can you review this too? > > Looks good to me too! > > Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> > Would you be OK to upgrade this to a Reviewed-by tag? Thanks! -- Steve
On Thu, 18 Apr 2019 23:14:45 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote: > On Thu, 18 Apr 2019, Josh Poimboeuf wrote: > > > On Thu, Apr 18, 2019 at 10:41:20AM +0200, Thomas Gleixner wrote: > > > - Remove the extra array member of stack_dump_trace[]. It's not required as > > > the stack tracer stores at max array size - 1 entries so there is still > > > an empty slot. > > > > What is the empty slot used for? > > I was trying to find an answer but failed. Maybe it's just historical > leftovers or Steven knows where the magic is in this maze. > I believe it was for historical leftovers (there was a time it was required), and left there for "paranoid" sake. But let me apply the patch and see if it is really needed. -- Steve
On Thu, 18 Apr 2019 17:24:43 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > I believe it was for historical leftovers (there was a time it was > required), and left there for "paranoid" sake. But let me apply the > patch and see if it is really needed. I removed the +1 on the max_entries and set SET_TRACE_ENTRIES to 5 (a bit extreme). Then I ran the stack tracing with KASAN enabled and it never complained. As stated, it was there for historical reasons and I felt 500 was way more than enough and left the buffer there just out of laziness and paranoia. Feel free to remove that if you want. -- Steve
On Thu, 18 Apr 2019 10:41:20 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > @@ -412,23 +404,20 @@ stack_trace_sysctl(struct ctl_table *tab > void __user *buffer, size_t *lenp, > loff_t *ppos) > { > - int ret; > + int ret, was_enabled; One small nit. Could this be: int was_enabled; int ret; I prefer only joining variables that are related on the same line. Makes it look cleaner IMO. > > mutex_lock(&stack_sysctl_mutex); > + was_enabled = !!stack_tracer_enabled; > Bah, not sure why I didn't do it this way to begin with. I think I copied something else that couldn't do it this way for some reason and didn't put any brain power behind the copy. :-/ But that was back in 2008 so I blame it on being "young and stupid" ;-) Other then the above nit and removing the unneeded +1 in max_entries: Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> -- Steve > ret = proc_dointvec(table, write, buffer, lenp, ppos); > > - if (ret || !write || > - (last_stack_tracer_enabled == !!stack_tracer_enabled)) > + if (ret || !write || (was_enabled == !!stack_tracer_enabled)) > goto out; > > - last_stack_tracer_enabled = !!stack_tracer_enabled; > - > if (stack_tracer_enabled) > register_ftrace_function(&trace_ops); > else > unregister_ftrace_function(&trace_ops); > - > out: > mutex_unlock(&stack_sysctl_mutex); > return ret; > @@ -444,7 +433,6 @@ static __init int enable_stacktrace(char > strncpy(stack_trace_filter_buf, str + len, COMMAND_LINE_SIZE); > > stack_tracer_enabled = 1; > - last_stack_tracer_enabled = 1; > return 1; > } > __setup("stacktrace", enable_stacktrace); >
On Fri, 19 Apr 2019 00:44:17 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote: > On Thu, 18 Apr 2019, Steven Rostedt wrote: > > On Thu, 18 Apr 2019 10:41:20 +0200 > > Thomas Gleixner <tglx@linutronix.de> wrote: > > > > > @@ -412,23 +404,20 @@ stack_trace_sysctl(struct ctl_table *tab > > > void __user *buffer, size_t *lenp, > > > loff_t *ppos) > > > { > > > - int ret; > > > + int ret, was_enabled; > > > > One small nit. Could this be: > > > > int was_enabled; > > int ret; > > > > I prefer only joining variables that are related on the same line. > > Makes it look cleaner IMO. > > If you wish so. To me it's waste of screen space :) At least you didn't say it helps the compiler ;-) > > > > > > > mutex_lock(&stack_sysctl_mutex); > > > + was_enabled = !!stack_tracer_enabled; > > > > > > > Bah, not sure why I didn't do it this way to begin with. I think I > > copied something else that couldn't do it this way for some reason and > > didn't put any brain power behind the copy. :-/ But that was back in > > 2008 so I blame it on being "young and stupid" ;-) > > The young part is gone for sure :) I purposely set you up for that response. > > > Other then the above nit and removing the unneeded +1 in max_entries: > > s/+1/-1/ That was an ode to G+ -- Steve
On Thu, Apr 18, 2019 at 05:42:55PM +0200, Thomas Gleixner wrote: > On Thu, 18 Apr 2019, Josh Poimboeuf wrote: > > Another idea I had (but never got a chance to work on) was to extend the > > x86 unwind interface to all arches. So instead of the callbacks, each > > arch would implement something like this API: > I surely thought about that, but after staring at all incarnations of > arch/*/stacktrace.c I just gave up. > > Aside of that quite some archs already have callback based unwinders > because they use them for more than stacktracing and just have a single > implementation of that loop. > > I'm fine either way. We can start with x86 and then let archs convert over > their stuff, but I wouldn't hold my breath that this will be completed in > the forseeable future. I suggested the same to Thomas early on, and I even spend the time to convert some $random arch to the iterator interface, and while it is indeed entirely feasible, it is _far_ more work. The callback thing OTOH is flexible enough to do what we want to do now, and allows converting most archs to it without too much pain (as Thomas said, many archs are already in this form and only need minor API adjustments), which gets us in a far better place than we are now. And we can always go to iterators later on. But I think getting the generic unwinder improved across all archs is a really important first step here.
On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote: > +typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr, > + bool reliable); > +void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, > + struct task_struct *task, struct pt_regs *regs); > +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void *cookie, > + struct task_struct *task); This bugs me a little; ideally the _reliable() thing would not exists. Thomas said that the existing __save_stack_trace_reliable() is different enough for the unification to be non-trivial, but maybe Josh can help out? From what I can see the biggest significant differences are: - it looks at the regs sets on the stack and for FP bails early - bails for khreads and idle (after it does all the hard work!?!) The first (FP checking for exceptions) should probably be reflected in consume_fn(.reliable) anyway -- although that would mean a lot of extra '?' entries where there are none today. And the second (KTHREAD/IDLE) is something that the generic code can easily do before calling into the arch unwinder. Hmm?
On Fri, Apr 19, 2019 at 10:32:30AM +0200, Thomas Gleixner wrote: > On Fri, 19 Apr 2019, Peter Zijlstra wrote: > > On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote: > > > > > +typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr, > > > + bool reliable); > > > > > +void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, > > > + struct task_struct *task, struct pt_regs *regs); > > > +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void *cookie, > > > + struct task_struct *task); > > > > This bugs me a little; ideally the _reliable() thing would not exists. > > > > Thomas said that the existing __save_stack_trace_reliable() is different > > enough for the unification to be non-trivial, but maybe Josh can help > > out? > > > > >From what I can see the biggest significant differences are: > > > > - it looks at the regs sets on the stack and for FP bails early > > - bails for khreads and idle (after it does all the hard work!?!) > > > > The first (FP checking for exceptions) should probably be reflected in > > consume_fn(.reliable) anyway -- although that would mean a lot of extra > > '?' entries where there are none today. > > > > And the second (KTHREAD/IDLE) is something that the generic code can > > easily do before calling into the arch unwinder. > > And looking at the powerpc version of it, that has even more interesting > extra checks in that function. Right, but not fundamentally different from determining @reliable I think. Anyway, it would be good if someone knowledgable could have a look at this.
On Thu, 18 Apr 2019 10:41:41 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > It's only used in trace.c and there is absolutely no point in compiling it > in when user space stack traces are not supported. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Cc: Steven Rostedt <rostedt@goodmis.org> Funny, these were moved out to global functions along with the ftrace_trace_stack() but I guess they were never used. This basically just does a partial revert of: c0a0d0d3f6528 ("tracing/core: Make the stack entry helpers global") > --- > kernel/trace/trace.c | 14 ++++++++------ > kernel/trace/trace.h | 8 -------- > 2 files changed, 8 insertions(+), 14 deletions(-) > > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -159,6 +159,8 @@ static union trace_eval_map_item *trace_ > #endif /* CONFIG_TRACE_EVAL_MAP_FILE */ > > static int tracing_set_tracer(struct trace_array *tr, const char *buf); > +static void ftrace_trace_userstack(struct ring_buffer *buffer, > + unsigned long flags, int pc); > > #define MAX_TRACER_SIZE 100 > static char bootup_tracer_buf[MAX_TRACER_SIZE] __initdata; > @@ -2905,9 +2907,10 @@ void trace_dump_stack(int skip) > } > EXPORT_SYMBOL_GPL(trace_dump_stack); > > +#ifdef CONFIG_USER_STACKTRACE_SUPPORT > static DEFINE_PER_CPU(int, user_stack_count); > > -void > +static void > ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags, int pc) > { > struct trace_event_call *call = &event_user_stack; > @@ -2958,13 +2961,12 @@ ftrace_trace_userstack(struct ring_buffe > out: > preempt_enable(); > } > - > -#ifdef UNUSED Strange, I never knew about this ifdef. I would have nuked it when I saw it. Anyway, Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> -- Steve > -static void __trace_userstack(struct trace_array *tr, unsigned long flags) > +#else /* CONFIG_USER_STACKTRACE_SUPPORT */ > +static void ftrace_trace_userstack(struct ring_buffer *buffer, > + unsigned long flags, int pc) > { > - ftrace_trace_userstack(tr, flags, preempt_count()); > } > -#endif /* UNUSED */ > +#endif /* !CONFIG_USER_STACKTRACE_SUPPORT */ > > #endif /* CONFIG_STACKTRACE */ > > --- a/kernel/trace/trace.h > +++ b/kernel/trace/trace.h > @@ -782,17 +782,9 @@ void update_max_tr_single(struct trace_a > #endif /* CONFIG_TRACER_MAX_TRACE */ > > #ifdef CONFIG_STACKTRACE > -void ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags, > - int pc); > - > void __trace_stack(struct trace_array *tr, unsigned long flags, int skip, > int pc); > #else > -static inline void ftrace_trace_userstack(struct ring_buffer *buffer, > - unsigned long flags, int pc) > -{ > -} > - > static inline void __trace_stack(struct trace_array *tr, unsigned long flags, > int skip, int pc) > { >
On Fri, Apr 19, 2019 at 09:02:11AM +0200, Peter Zijlstra wrote: > On Thu, Apr 18, 2019 at 05:42:55PM +0200, Thomas Gleixner wrote: > > On Thu, 18 Apr 2019, Josh Poimboeuf wrote: > > > > Another idea I had (but never got a chance to work on) was to extend the > > > x86 unwind interface to all arches. So instead of the callbacks, each > > > arch would implement something like this API: > > > I surely thought about that, but after staring at all incarnations of > > arch/*/stacktrace.c I just gave up. > > > > Aside of that quite some archs already have callback based unwinders > > because they use them for more than stacktracing and just have a single > > implementation of that loop. > > > > I'm fine either way. We can start with x86 and then let archs convert over > > their stuff, but I wouldn't hold my breath that this will be completed in > > the forseeable future. > > I suggested the same to Thomas early on, and I even spend the time to > convert some $random arch to the iterator interface, and while it is > indeed entirely feasible, it is _far_ more work. > > The callback thing OTOH is flexible enough to do what we want to do now, > and allows converting most archs to it without too much pain (as Thomas > said, many archs are already in this form and only need minor API > adjustments), which gets us in a far better place than we are now. > > And we can always go to iterators later on. But I think getting the > generic unwinder improved across all archs is a really important first > step here. Fair enough.
On Fri, Apr 19, 2019 at 11:07:17AM +0200, Peter Zijlstra wrote: > On Fri, Apr 19, 2019 at 10:32:30AM +0200, Thomas Gleixner wrote: > > On Fri, 19 Apr 2019, Peter Zijlstra wrote: > > > On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote: > > > > > > > +typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr, > > > > + bool reliable); > > > > > > > +void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, > > > > + struct task_struct *task, struct pt_regs *regs); > > > > +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void *cookie, > > > > + struct task_struct *task); > > > > > > This bugs me a little; ideally the _reliable() thing would not exists. > > > > > > Thomas said that the existing __save_stack_trace_reliable() is different > > > enough for the unification to be non-trivial, but maybe Josh can help > > > out? > > > > > > >From what I can see the biggest significant differences are: > > > > > > - it looks at the regs sets on the stack and for FP bails early > > > - bails for khreads and idle (after it does all the hard work!?!) That's done for a reason, see the "Success path" comments. > > > The first (FP checking for exceptions) should probably be reflected in > > > consume_fn(.reliable) anyway -- although that would mean a lot of extra > > > '?' entries where there are none today. > > > > > > And the second (KTHREAD/IDLE) is something that the generic code can > > > easily do before calling into the arch unwinder. > > > > And looking at the powerpc version of it, that has even more interesting > > extra checks in that function. > > Right, but not fundamentally different from determining @reliable I > think. > > Anyway, it would be good if someone knowledgable could have a look at > this. Yeah, we could probably do that. The flow would need to be changed a bit -- some of the errors are soft errors which most users don't care about because they just want a best effort. The soft errors can be remembered without breaking out of the loop, and just returned at the end. Most users could just ignore the return code. The only thing I'd be worried about is performance for the non-livepatch users, but I guess those checks don't look very expensive. And the x86 unwinders are already pretty slow anyway...
On Thu, 18 Apr 2019 10:41:42 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > Replace the indirection through struct stack_trace by using the storage > array based interfaces. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> -- Steve
On Thu, 18 Apr 2019 10:41:43 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > Simplify the stack retrieval code by using the storage array based > interface. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> -- Steve
On Thu, 18 Apr 2019, Thomas Gleixner wrote: > Replace the indirection through struct stack_trace by using the storage > array based interfaces. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Miroslav Benes <mbenes@suse.cz> Feel free to take it through tip or let us know to pick it up. Miroslav
On Thu, Apr 18, 2019 at 10:41:37AM +0200, Thomas Gleixner wrote: > There is only one caller of check_prev_add() which hands in a zeroed struct > stack trace and a function pointer to save_stack(). Inside check_prev_add() > the stack_trace struct is checked for being empty, which is always > true. Based on that one code path stores a stack trace which is unused. The > comment there does not make sense either. It's all leftovers from > historical lockdep code (cross release). I was more or less expecting a revert of: ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace") And then I read the comment that went with the "static struct stack_trace trace" that got removed (in the above commit) and realized that your patch will consume more stack entries. The problem is when the held lock stack in check_prevs_add() has multple trylock entries on top, in that case we call check_prev_add() multiple times, and this patch will then save the exact same stack-trace multiple times, consuming static resources. Possibly we should copy what stackdepot does (but we cannot use it directly because stackdepot uses locks; but possible we can share bits), but that is a patch for another day I think. So while convoluted, perhaps we should retain this code for now.
On Thu, Apr 18, 2019 at 10:41:38AM +0200, Thomas Gleixner wrote: > Replace the indirection through struct stack_trace by using the storage > array based interfaces and storing the information is a small lockdep > specific data structure. > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
On Wed, 24 Apr 2019, Peter Zijlstra wrote: > On Thu, Apr 18, 2019 at 10:41:37AM +0200, Thomas Gleixner wrote: > > There is only one caller of check_prev_add() which hands in a zeroed struct > > stack trace and a function pointer to save_stack(). Inside check_prev_add() > > the stack_trace struct is checked for being empty, which is always > > true. Based on that one code path stores a stack trace which is unused. The > > comment there does not make sense either. It's all leftovers from > > historical lockdep code (cross release). > > I was more or less expecting a revert of: > > ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace") > > And then I read the comment that went with the "static struct > stack_trace trace" that got removed (in the above commit) and realized > that your patch will consume more stack entries. > > The problem is when the held lock stack in check_prevs_add() has multple > trylock entries on top, in that case we call check_prev_add() multiple > times, and this patch will then save the exact same stack-trace multiple > times, consuming static resources. > > Possibly we should copy what stackdepot does (but we cannot use it > directly because stackdepot uses locks; but possible we can share bits), > but that is a patch for another day I think. > > So while convoluted, perhaps we should retain this code for now. Uurg, what a mess.