mbox series

[v7,0/4] page_owner: print stacks and their outstanding allocations

Message ID 20240208234539.19113-1-osalvador@suse.de (mailing list archive)
Headers show
Series page_owner: print stacks and their outstanding allocations | expand

Message

Oscar Salvador Feb. 8, 2024, 11:45 p.m. UTC
Changes v6 -> v7:
     - Rebased on top of Andrey Konovalov's libstackdepot patchset
     - Reformulated the changelogs

Changes v5 -> v6:
     - Rebase on top of v6.7-rc1
     - Move stack_record struct to the header
     - Addressed feedback from Vlastimil
       (some code tweaks and changelogs suggestions)

Changes v4 -> v5:
     - Addressed feedback from Alexander Potapenko

Changes v3 -> v4:
     - Rebase (long time has passed)
     - Use boolean instead of enum for action by Alexander Potapenko
     - (I left some feedback untouched because it's been long and
        would like to discuss it here now instead of re-vamping
        and old thread)

Changes v2 -> v3:
     - Replace interface in favor of seq operations
       (suggested by Vlastimil)
     - Use debugfs interface to store/read valued (suggested by Ammar)


page_owner is a great debug functionality tool that lets us know
about all pages that have been allocated/freed and their specific
stacktrace.
This comes very handy when debugging memory leaks, since with
some scripting we can see the outstanding allocations, which might point
to a memory leak.

In my experience, that is one of the most useful cases, but it can get
really tedious to screen through all pages and try to reconstruct the
stack <-> allocated/freed relationship, becoming most of the time a
daunting and slow process when we have tons of allocation/free operations. 

This patchset aims to ease that by adding a new functionality into
page_owner.
This functionality creates a new read-only file called "page_owner_stacks",
which prints out all the stacks followed by their outstanding number
of allocations (being that the times the stacktrace has allocated
but not freed yet).
This gives us a clear and a quick overview of stacks <-> allocated/free.

We take advantage of the new refcount_f field that stack_record struct
gained, and increment/decrement the stack refcount on every
__set_page_owner() (alloc operation) and __reset_page_owner (free operation)
call.

Unfortunately, we cannot use the new stackdepot api
STACK_DEPOT_FLAG_{GET,PUT} because it does not fulfill page_owner needs,
meaning we would have to special case things, at which point
makes more sense for page_owner to do its own {dec,inc}rementing
of the stacks.
E.g: Using STACK_DEPOT_FLAG_PUT, once the refcount reaches 0,
such stack gets evicted, so page_owner would lose information.

This patch also creates a new file called 'page_owner_threshold'.
By writing a value to it, the stacks which refcount is below such
value will be filtered out.

In order to better exercise the path in stack_depot_get_next_stack(),
I artificially filled the buckets with more than one stack, making sure
I was getting all of then when reading from it.

On a side note, stack_depot_get_next_stack() could be somehow reconstructed
to be in page_owner code, but we would have to move stack_table
into the header, so page_owner can access it.
I can do that if that's preferred, so stackdepot.c would not get "poluted".

A PoC can be found below:

 # cat /sys/kernel/debug/page_owner_stacks > page_owner_full_stacks.txt
 # head -40 page_owner_full_stacks.txt 
  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  alloc_pages_mpol+0x91/0x1f0
  folio_alloc+0x14/0x50
  filemap_alloc_folio+0xb2/0x100
  page_cache_ra_unbounded+0x96/0x180
  filemap_get_pages+0xfd/0x590
  filemap_read+0xcc/0x330
  blkdev_read_iter+0xb8/0x150
  vfs_read+0x285/0x320
  ksys_read+0xa5/0xe0
  do_syscall_64+0x80/0x160
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 stack_count: 521



  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  alloc_pages_mpol+0x91/0x1f0
  folio_alloc+0x14/0x50
  filemap_alloc_folio+0xb2/0x100
  __filemap_get_folio+0x14a/0x490
  ext4_write_begin+0xbd/0x4b0 [ext4]
  generic_perform_write+0xc1/0x1e0
  ext4_buffered_write_iter+0x68/0xe0 [ext4]
  ext4_file_write_iter+0x70/0x740 [ext4]
  vfs_write+0x33d/0x420
  ksys_write+0xa5/0xe0
  do_syscall_64+0x80/0x160
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 stack_count: 4609
...
...

 # echo 5000 > /sys/kernel/debug/page_owner_threshold 
 # cat /sys/kernel/debug/page_owner_stacks > page_owner_full_stacks_5000.txt
 # head -40 page_owner_full_stacks_5000.txt 
  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  alloc_pages_mpol+0x91/0x1f0
  folio_alloc+0x14/0x50
  filemap_alloc_folio+0xb2/0x100
  __filemap_get_folio+0x14a/0x490
  ext4_write_begin+0xbd/0x4b0 [ext4]
  generic_perform_write+0xc1/0x1e0
  ext4_buffered_write_iter+0x68/0xe0 [ext4]
  ext4_file_write_iter+0x70/0x740 [ext4]
  vfs_write+0x33d/0x420
  ksys_pwrite64+0x75/0x90
  do_syscall_64+0x80/0x160
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 stack_count: 6781



  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  pcpu_populate_chunk+0xec/0x350
  pcpu_balance_workfn+0x2d1/0x4a0
  process_scheduled_works+0x84/0x380
  worker_thread+0x12a/0x2a0
  kthread+0xe3/0x110
  ret_from_fork+0x30/0x50
  ret_from_fork_asm+0x1b/0x30
 stack_count: 8641

Oscar Salvador (4):
  lib/stackdepot: Move stack_record struct definition into the header
  mm,page_owner: Implement the tracking of the stacks count
  mm,page_owner: Display all stacks and their count
  mm,page_owner: Filter out stacks by a threshold

 include/linux/stackdepot.h |  72 ++++++++++++++++++++
 lib/stackdepot.c           |  97 ++++++++++++++------------
 mm/page_owner.c            | 136 +++++++++++++++++++++++++++++++++++++
 3 files changed, 262 insertions(+), 43 deletions(-)

Comments

Andrew Morton Feb. 9, 2024, 12:28 a.m. UTC | #1
On Fri,  9 Feb 2024 00:45:35 +0100 Oscar Salvador <osalvador@suse.de> wrote:

> page_owner is a great debug functionality tool that lets us know
> about all pages that have been allocated/freed and their specific
> stacktrace.
> This comes very handy when debugging memory leaks, since with
> some scripting we can see the outstanding allocations, which might point
> to a memory leak.
> 
> In my experience, that is one of the most useful cases, but it can get
> really tedious to screen through all pages and try to reconstruct the
> stack <-> allocated/freed relationship, becoming most of the time a
> daunting and slow process when we have tons of allocation/free operations. 
> 
> This patchset aims to ease that by adding a new functionality into
> page_owner.
> This functionality creates a new read-only file called "page_owner_stacks",

The full path would be appreciated.

> which prints out all the stacks followed by their outstanding number
> of allocations (being that the times the stacktrace has allocated
> but not freed yet).
> This gives us a clear and a quick overview of stacks <-> allocated/free.
> 
> We take advantage of the new refcount_f field that stack_record struct
> gained, and increment/decrement the stack refcount on every
> __set_page_owner() (alloc operation) and __reset_page_owner (free operation)
> call.
> 
> Unfortunately, we cannot use the new stackdepot api
> STACK_DEPOT_FLAG_{GET,PUT} because it does not fulfill page_owner needs,
> meaning we would have to special case things, at which point
> makes more sense for page_owner to do its own {dec,inc}rementing
> of the stacks.
> E.g: Using STACK_DEPOT_FLAG_PUT, once the refcount reaches 0,
> such stack gets evicted, so page_owner would lose information.
> 
> This patch also creates a new file called 'page_owner_threshold'.
> By writing a value to it, the stacks which refcount is below such
> value will be filtered out.
> 
> In order to better exercise the path in stack_depot_get_next_stack(),
> I artificially filled the buckets with more than one stack, making sure
> I was getting all of then when reading from it.
> 
> On a side note, stack_depot_get_next_stack() could be somehow reconstructed
> to be in page_owner code, but we would have to move stack_table
> into the header, so page_owner can access it.
> I can do that if that's preferred, so stackdepot.c would not get "poluted".
> 
> A PoC can be found below:
> 
>  # cat /sys/kernel/debug/page_owner_stacks > page_owner_full_stacks.txt

Oh, there it is.  I wonder why we didn't use /sys/kernel/mm/

Would a new /sys/kernel/debug/page_owner_something/ be a good idea?  We
might add more things later.  Then it can be
/sys/kernel/debug/page_owner_something/full_stacks. 
/sys/kernel/debug/page_owner/ would be nice, but it's too late for
that.

> Oscar Salvador (4):
>   lib/stackdepot: Move stack_record struct definition into the header
>   mm,page_owner: Implement the tracking of the stacks count
>   mm,page_owner: Display all stacks and their count
>   mm,page_owner: Filter out stacks by a threshold
> 
>  include/linux/stackdepot.h |  72 ++++++++++++++++++++
>  lib/stackdepot.c           |  97 ++++++++++++++------------
>  mm/page_owner.c            | 136 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 262 insertions(+), 43 deletions(-)

Documentation/mm/page_owner.rst?
Marco Elver Feb. 9, 2024, 8:03 a.m. UTC | #2
On Fri, 9 Feb 2024 at 00:45, Oscar Salvador <osalvador@suse.de> wrote:
>
> Changes v6 -> v7:
>      - Rebased on top of Andrey Konovalov's libstackdepot patchset
>      - Reformulated the changelogs

Overall looks fine, but I think it's not rebased against the latest
stackdepot version in -next. That version re-introduces variable-sized
records and there's some trickery required so you do not end up with
refcount warnings.
Oscar Salvador Feb. 9, 2024, 9:31 p.m. UTC | #3
On Thu, Feb 08, 2024 at 04:28:18PM -0800, Andrew Morton wrote:
> Oh, there it is.  I wonder why we didn't use /sys/kernel/mm/

I just followed the convention we use at the moment, which happens
to be /sys/kernel/debug/page_owner_xxx.

> Would a new /sys/kernel/debug/page_owner_something/ be a good idea?  We
> might add more things later.  Then it can be
> /sys/kernel/debug/page_owner_something/full_stacks. 
> /sys/kernel/debug/page_owner/ would be nice, but it's too late for
> that.

Well, we could certainly do that, so we do not scatter the files
in /sys/kernel/debug/ but rather gather them in pagE_owner directory.
Let me see if I can come up with a proper name.

> > Oscar Salvador (4):
> >   lib/stackdepot: Move stack_record struct definition into the header
> >   mm,page_owner: Implement the tracking of the stacks count
> >   mm,page_owner: Display all stacks and their count
> >   mm,page_owner: Filter out stacks by a threshold
> > 
> >  include/linux/stackdepot.h |  72 ++++++++++++++++++++
> >  lib/stackdepot.c           |  97 ++++++++++++++------------
> >  mm/page_owner.c            | 136 +++++++++++++++++++++++++++++++++++++
> >  3 files changed, 262 insertions(+), 43 deletions(-)
> 
> Documentation/mm/page_owner.rst?

Heh, we are definitely going to need some documentation.
Let me prepare something.

Thanks Andrew
Oscar Salvador Feb. 9, 2024, 9:32 p.m. UTC | #4
On Fri, Feb 09, 2024 at 09:03:21AM +0100, Marco Elver wrote:
> On Fri, 9 Feb 2024 at 00:45, Oscar Salvador <osalvador@suse.de> wrote:
> >
> > Changes v6 -> v7:
> >      - Rebased on top of Andrey Konovalov's libstackdepot patchset
> >      - Reformulated the changelogs
> 
> Overall looks fine, but I think it's not rebased against the latest
> stackdepot version in -next. That version re-introduces variable-sized
> records and there's some trickery required so you do not end up with
> refcount warnings.

Thanks a lot Marco for having a look and for the feedback, that's highly
appreciated.