Message ID | 1d1ad5692ee43d4fc2b3fd9d221331d30b36123f.1700502145.git.andreyknvl@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | stackdepot: allow evicting stack traces | expand |
On Mon, Nov 20, 2023 at 06:47:15PM +0100, andrey.konovalov@linux.dev wrote: > From: Andrey Konovalov <andreyknvl@google.com> > > Add stack_depot_put, a function that decrements the reference counter > on a stack record and removes it from the stack depot once the counter > reaches 0. > > Internally, when removing a stack record, the function unlinks it from > the hash table bucket and returns to the freelist. > > With this change, the users of stack depot can call stack_depot_put > when keeping a stack trace in the stack depot is not needed anymore. > This allows avoiding polluting the stack depot with irrelevant stack > traces and thus have more space to store the relevant ones before the > stack depot reaches its capacity. > > Signed-off-by: Andrey Konovalov <andreyknvl@google.com> I yet have to review the final bits of this series, but I'd like to comment on something below > +void stack_depot_put(depot_stack_handle_t handle) > +{ > + struct stack_record *stack; > + unsigned long flags; > + > + if (!handle || stack_depot_disabled) > + return; > + > + write_lock_irqsave(&pool_rwlock, flags); > + > + stack = depot_fetch_stack(handle); > + if (WARN_ON(!stack)) > + goto out; > + > + if (refcount_dec_and_test(&stack->count)) { > + /* Unlink stack from the hash table. */ > + list_del(&stack->list); > + > + /* Free stack. */ > + depot_free_stack(stack); It would be great if stack_depot_put would also accept a boolean, which would determine whether we want to erase the stack or not. For the feature I'm working on page_ower [1], I need to keep track of how many times we allocated/freed from a certain path, which may expose a potential leak, and I was using the refcount to do that, but I don't want the record to be erased, because this new functionality won't be exclusive with the existing one. e.g: you can check /sys/kernel/debug/page_owner AND /sys/kernel/debug/page_owner_stacks So, while the new functionaliy won't care if a record has been erased, the old one will, so information will be lost. [1] https://patchwork.kernel.org/project/linux-mm/cover/20231120084300.4368-1-osalvador@suse.de/
On Thu, 4 Jan 2024 at 09:52, Oscar Salvador <osalvador@suse.de> wrote: > > On Mon, Nov 20, 2023 at 06:47:15PM +0100, andrey.konovalov@linux.dev wrote: > > From: Andrey Konovalov <andreyknvl@google.com> > > > > Add stack_depot_put, a function that decrements the reference counter > > on a stack record and removes it from the stack depot once the counter > > reaches 0. > > > > Internally, when removing a stack record, the function unlinks it from > > the hash table bucket and returns to the freelist. > > > > With this change, the users of stack depot can call stack_depot_put > > when keeping a stack trace in the stack depot is not needed anymore. > > This allows avoiding polluting the stack depot with irrelevant stack > > traces and thus have more space to store the relevant ones before the > > stack depot reaches its capacity. > > > > Signed-off-by: Andrey Konovalov <andreyknvl@google.com> > > I yet have to review the final bits of this series, but I'd like to > comment on something below > > > > +void stack_depot_put(depot_stack_handle_t handle) > > +{ > > + struct stack_record *stack; > > + unsigned long flags; > > + > > + if (!handle || stack_depot_disabled) > > + return; > > + > > + write_lock_irqsave(&pool_rwlock, flags); > > + > > + stack = depot_fetch_stack(handle); > > + if (WARN_ON(!stack)) > > + goto out; > > + > > + if (refcount_dec_and_test(&stack->count)) { > > + /* Unlink stack from the hash table. */ > > + list_del(&stack->list); > > + > > + /* Free stack. */ > > + depot_free_stack(stack); > > It would be great if stack_depot_put would also accept a boolean, > which would determine whether we want to erase the stack or not. I think a boolean makes the interface more confusing for everyone else. At that point stack_depot_put merely decrements the refcount and becomes a wrapper around refcount_dec, right? I think you want to expose the stack_record struct anyway for your series, so why not simply avoid calling stack_depot_put and decrement the refcount with your own helper (there needs to be a new stackdepot function to return a stack_record under the pool_rwlock held as reader). Also, you need to ensure noone else calls stack_depot_put on the stack traces you want to keep. If there is a risk someone else may call stack_depot_put on them, it obviously won't work (I think the only option then is to introduce a way to pin stacks). > For the feature I'm working on page_ower [1], I need to keep track > of how many times we allocated/freed from a certain path, which may > expose a potential leak, and I was using the refcount to do that, > but I don't want the record to be erased, because this new > functionality won't be exclusive with the existing one. > > e.g: you can check /sys/kernel/debug/page_owner AND > /sys/kernel/debug/page_owner_stacks > > So, while the new functionaliy won't care if a record has been erased, > the old one will, so information will be lost. > > [1] https://patchwork.kernel.org/project/linux-mm/cover/20231120084300.4368-1-osalvador@suse.de/ > > > > -- > Oscar Salvador > SUSE Labs
On Thu, Jan 04, 2024 at 10:25:40AM +0100, Marco Elver wrote: > I think a boolean makes the interface more confusing for everyone > else. At that point stack_depot_put merely decrements the refcount and > becomes a wrapper around refcount_dec, right? Thanks Marco for the feedback. Fair enough. > I think you want to expose the stack_record struct anyway for your > series, so why not simply avoid calling stack_depot_put and decrement > the refcount with your own helper (there needs to be a new stackdepot > function to return a stack_record under the pool_rwlock held as > reader). Yeah, that was something I was experimenting with my last version. See [0], I moved the "stack_record" struct into the header so page_owner can make sense of it. I guess that's fine right? If so, I'd do as you mentioned, just decrementing it with my own helper so no calls to stack_depot_put will be needed. Regarding the locking, I yet have to check the patch that implements the read/write lock, but given that page_owner won't be evicting anything, do I still have to fiddle with the locks? > Also, you need to ensure noone else calls stack_depot_put on the stack > traces you want to keep. If there is a risk someone else may call > stack_depot_put on them, it obviously won't work (I think the only > option then is to introduce a way to pin stacks). Well, since page_owner won't call stack_depot_put, I don't see how someone else would be able to interfere there, so I think I am safe there. [0] https://patchwork.kernel.org/project/linux-mm/patch/20231120084300.4368-3-osalvador@suse.de/
On Thu, 4 Jan 2024 at 11:18, Oscar Salvador <osalvador@suse.de> wrote: > > On Thu, Jan 04, 2024 at 10:25:40AM +0100, Marco Elver wrote: > > I think a boolean makes the interface more confusing for everyone > > else. At that point stack_depot_put merely decrements the refcount and > > becomes a wrapper around refcount_dec, right? > > Thanks Marco for the feedback. > > Fair enough. > > > I think you want to expose the stack_record struct anyway for your > > series, so why not simply avoid calling stack_depot_put and decrement > > the refcount with your own helper (there needs to be a new stackdepot > > function to return a stack_record under the pool_rwlock held as > > reader). > > Yeah, that was something I was experimenting with my last version. > See [0], I moved the "stack_record" struct into the header so page_owner > can make sense of it. I guess that's fine right? Not exposing the internals would be better, but at this point I think your usecase looks like it's doing something that is somewhat out of the bounds of the stackdepot API. I also don't see that it makes sense to add more helpers to the stackdepot API to deal with such special cases. As such, I'd reason that it's ok to expose the struct for this special usecase. > If so, I'd do as you mentioned, just decrementing it with my own helper > so no calls to stack_depot_put will be needed. > > Regarding the locking, I yet have to check the patch that implements > the read/write lock, but given that page_owner won't be evicting > anything, do I still have to fiddle with the locks? You need to grab the lock as a reader to fetch the stack_record and return it. All that should be hidden behind whatever function you'll introduce to return the stack_record (stack_depot_find()?). > > Also, you need to ensure noone else calls stack_depot_put on the stack > > traces you want to keep. If there is a risk someone else may call > > stack_depot_put on them, it obviously won't work (I think the only > > option then is to introduce a way to pin stacks). > > Well, since page_owner won't call stack_depot_put, I don't see > how someone else would be able to interfere there, so I think > I am safe there. > > [0] https://patchwork.kernel.org/project/linux-mm/patch/20231120084300.4368-3-osalvador@suse.de/ > > -- > Oscar Salvador > SUSE Labs
diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h index 611716702d73..a6796f178913 100644 --- a/include/linux/stackdepot.h +++ b/include/linux/stackdepot.h @@ -97,6 +97,8 @@ static inline int stack_depot_early_init(void) { return 0; } * * If STACK_DEPOT_FLAG_GET is set in @depot_flags, stack depot will increment * the refcount on the saved stack trace if it already exists in stack depot. + * Users of this flag must also call stack_depot_put() when keeping the stack + * trace is no longer required to avoid overflowing the refcount. * * If the provided stack trace comes from the interrupt context, only the part * up to the interrupt entry is saved. @@ -162,6 +164,18 @@ void stack_depot_print(depot_stack_handle_t stack); int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size, int spaces); +/** + * stack_depot_put - Drop a reference to a stack trace from stack depot + * + * @handle: Stack depot handle returned from stack_depot_save() + * + * The stack trace is evicted from stack depot once all references to it have + * been dropped (once the number of stack_depot_evict() calls matches the + * number of stack_depot_save_flags() calls with STACK_DEPOT_FLAG_GET set for + * this stack trace). + */ +void stack_depot_put(depot_stack_handle_t handle); + /** * stack_depot_set_extra_bits - Set extra bits in a stack depot handle * diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 911dee11bf39..c1b31160f4b4 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -394,7 +394,7 @@ static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle) size_t offset = parts.offset << DEPOT_STACK_ALIGN; struct stack_record *stack; - lockdep_assert_held_read(&pool_rwlock); + lockdep_assert_held(&pool_rwlock); if (parts.pool_index > pools_num) { WARN(1, "pool index %d out of bounds (%d) for stack id %08x\n", @@ -410,6 +410,14 @@ static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle) return stack; } +/* Links stack into the freelist. */ +static void depot_free_stack(struct stack_record *stack) +{ + lockdep_assert_held_write(&pool_rwlock); + + list_add(&stack->list, &free_stacks); +} + /* Calculates the hash for a stack. */ static inline u32 hash_stack(unsigned long *entries, unsigned int size) { @@ -592,6 +600,33 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle, } EXPORT_SYMBOL_GPL(stack_depot_fetch); +void stack_depot_put(depot_stack_handle_t handle) +{ + struct stack_record *stack; + unsigned long flags; + + if (!handle || stack_depot_disabled) + return; + + write_lock_irqsave(&pool_rwlock, flags); + + stack = depot_fetch_stack(handle); + if (WARN_ON(!stack)) + goto out; + + if (refcount_dec_and_test(&stack->count)) { + /* Unlink stack from the hash table. */ + list_del(&stack->list); + + /* Free stack. */ + depot_free_stack(stack); + } + +out: + write_unlock_irqrestore(&pool_rwlock, flags); +} +EXPORT_SYMBOL_GPL(stack_depot_put); + void stack_depot_print(depot_stack_handle_t stack) { unsigned long *entries;