[0/2] Introduce panic function when slub leaks

Message ID	20240925032256.1782-1-fangzheng.zhang@unisoc.com (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Fangzheng Zhang <fangzheng.zhang@unisoc.com> To: Christoph Lameter <cl@linux.com>, Pekka Enberg <penberg@kernel.org>, David Rientjes <rientjes@google.com>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Andrew Morton <akpm@linux-foundation.org>, Vlastimil Babka <vbabka@suse.cz>, Roman Gushchin <roman.gushchin@linux.dev>, Hyeonggon Yoo <42.hyeyoo@gmail.com>, Greg KH <gregkh@linuxfoundation.org> CC: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, <tkjos@google.com>, Fangzheng Zhang <fangzheng.zhang@unisoc.com>, Fangzheng Zhang <fangzheng.zhang1003@gmail.com>, Yuming Han <yuming.han@unisoc.com> Subject: [PATCH 0/2] Introduce panic function when slub leaks Date: Wed, 25 Sep 2024 11:22:54 +0800 Message-ID: <20240925032256.1782-1-fangzheng.zhang@unisoc.com> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Introduce panic function when slub leaks \| expand [0/2] Introduce panic function when slub leaks [1/2] mm/slub: Add panic function when slub leaks [2/2] Documentation: admin-guide: kernel-parameters: Add parameter description for slub_leak_panic …

Message ID

20240925032256.1782-1-fangzheng.zhang@unisoc.com (mailing list archive)

Headers

From: Fangzheng Zhang <fangzheng.zhang@unisoc.com>
To: Christoph Lameter <cl@linux.com>, Pekka Enberg <penberg@kernel.org>,
        David
 Rientjes <rientjes@google.com>,
        Joonsoo Kim <iamjoonsoo.kim@lge.com>,
        Andrew
 Morton <akpm@linux-foundation.org>,
        Vlastimil Babka <vbabka@suse.cz>,
        Roman
 Gushchin <roman.gushchin@linux.dev>,
        Hyeonggon Yoo <42.hyeyoo@gmail.com>,
        Greg KH <gregkh@linuxfoundation.org>
CC: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, <tkjos@google.com>,
        Fangzheng Zhang <fangzheng.zhang@unisoc.com>,
        Fangzheng Zhang
	<fangzheng.zhang1003@gmail.com>,
        Yuming Han <yuming.han@unisoc.com>
Subject: [PATCH 0/2] Introduce panic function when slub leaks
Date: Wed, 25 Sep 2024 11:22:54 +0800
Message-ID: <20240925032256.1782-1-fangzheng.zhang@unisoc.com>
MIME-Version: 1.0
Content-Type: text/plain
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

Introduce panic function when slub leaks | expand

Message

Fangzheng Zhang Sept. 25, 2024, 3:22 a.m. UTC

Hi all,

A method to detect slub leaks by monitoring its usage in real time
on the page allocation path of the slub. When the slub occupancy
exceeds the user-set value, it is considered that the slub is leaking
at this time, and a panic operation will be triggered immediately.

Fangzheng Zhang (2):
  mm/slub: Add panic function when slub leaks
  Documentation: admin-guide: kernel-parameters: Add parameter
    description for slub_leak_panic function

 .../admin-guide/kernel-parameters.txt         | 15 ++++
 mm/Kconfig                                    | 11 ++++++++
 mm/slub.c                                     | 76 +++++++++++++++++++

 3 files changed, 102 insertions(+)

Comments

Hyeonggon Yoo Sept. 25, 2024, 1:18 p.m. UTC | #1

On Wed, Sep 25, 2024 at 12:23 PM Fangzheng Zhang
<fangzheng.zhang@unisoc.com> wrote:
>
> Hi all,

Hi Fangzheng,

> A method to detect slub leaks by monitoring its usage in real time
> on the page allocation path of the slub. When the slub occupancy
> exceeds the user-set value, it is considered that the slub is leaking
> at this time

I'm not sure why this should be a kernel feature. Why not write a user
script that parses
MemTotal: and Slab: part of /proc/meminfo file and generates a log
entry or an alarm?

> and a panic operation will be triggered immediately.

I don't think it would be a good idea to panic unnecessarily.
IMO it is not proper to panic when the kernel can still run.

Any thoughts?

Thanks,
Hyeonggon

Vlastimil Babka Sept. 26, 2024, 12:30 p.m. UTC | #2

On 9/25/24 15:18, Hyeonggon Yoo wrote:
> On Wed, Sep 25, 2024 at 12:23 PM Fangzheng Zhang
> <fangzheng.zhang@unisoc.com> wrote:
>>
>> Hi all,
> 
> Hi Fangzheng,
> 
>> A method to detect slub leaks by monitoring its usage in real time
>> on the page allocation path of the slub. When the slub occupancy
>> exceeds the user-set value, it is considered that the slub is leaking
>> at this time
> 
> I'm not sure why this should be a kernel feature. Why not write a user
> script that parses
> MemTotal: and Slab: part of /proc/meminfo file and generates a log
> entry or an alarm?

Yes very much agreed. It seems rather arbitrary. Why slab, why not any other
kernel-specific counter in /proc/meminfo? Why include NR_SLAB_RECLAIMABLE_B
when that's used by caches with shrinkers?
A userspace solution should be straightforward and universal - easily
configurable for different scenarios.

>> and a panic operation will be triggered immediately.
> 
> I don't think it would be a good idea to panic unnecessarily.
> IMO it is not proper to panic when the kernel can still run.

Yes these days it's practically impossible to add a BUG_ON() for more
serious conditions than this.

Please don't post new versions addressing specific implementation details
until this fundamental issue is addressed.

Thanks,
Vlastimil

> Any thoughts?
> 
> Thanks,
> Hyeonggon

zhang fangzheng Sept. 27, 2024, 7:28 a.m. UTC | #3

On Thu, Sep 26, 2024 at 8:30 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 9/25/24 15:18, Hyeonggon Yoo wrote:
> > On Wed, Sep 25, 2024 at 12:23 PM Fangzheng Zhang
> > <fangzheng.zhang@unisoc.com> wrote:
> >>
> >> Hi all,
> >
> > Hi Fangzheng,
> >
> >> A method to detect slub leaks by monitoring its usage in real time
> >> on the page allocation path of the slub. When the slub occupancy
> >> exceeds the user-set value, it is considered that the slub is leaking
> >> at this time
> >
> > I'm not sure why this should be a kernel feature. Why not write a user
> > script that parses
> > MemTotal: and Slab: part of /proc/meminfo file and generates a log
> > entry or an alarm?
>
> Yes very much agreed. It seems rather arbitrary. Why slab, why not any other
> kernel-specific counter in /proc/meminfo? Why include NR_SLAB_RECLAIMABLE_B
> when that's used by caches with shrinkers?

Ok, this is because the current consideration is to specifically
track the memory usage of the slab module.
In the stability test, ie, monkey test,
the anr or reboot problem occurs, there is a high probability
that the slab occupancy is high when it comes to memory analysis.
In addition to directly monitoring leaks in the allocation path, it is
also convenient to record the allocation stack information
when an exception occurs.

> A userspace solution should be straightforward and universal - easily
> configurable for different scenarios.
>
> >> and a panic operation will be triggered immediately.
> >
> > I don't think it would be a good idea to panic unnecessarily.
> > IMO it is not proper to panic when the kernel can still run.
>
> Yes these days it's practically impossible to add a BUG_ON() for more
> serious conditions than this.
>
> Please don't post new versions addressing specific implementation details
> until this fundamental issue is addressed.
>
> Thanks,
> Vlastimil
>
> > Any thoughts?
> >
> > Thanks,
> > Hyeonggon
>

Hyeonggon Yoo Sept. 27, 2024, 8:01 a.m. UTC | #4

On Fri, Sep 27, 2024 at 4:28 PM zhang fangzheng
<fangzheng.zhang1003@gmail.com> wrote:
>
> On Thu, Sep 26, 2024 at 8:30 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > On 9/25/24 15:18, Hyeonggon Yoo wrote:
> > > On Wed, Sep 25, 2024 at 12:23 PM Fangzheng Zhang
> > > <fangzheng.zhang@unisoc.com> wrote:
> > >>
> > >> Hi all,
> > >
> > > Hi Fangzheng,
> > >
> > >> A method to detect slub leaks by monitoring its usage in real time
> > >> on the page allocation path of the slub. When the slub occupancy
> > >> exceeds the user-set value, it is considered that the slub is leaking
> > >> at this time
> > >
> > > I'm not sure why this should be a kernel feature. Why not write a user
> > > script that parses
> > > MemTotal: and Slab: part of /proc/meminfo file and generates a log
> > > entry or an alarm?
> >
> > Yes very much agreed. It seems rather arbitrary. Why slab, why not any other
> > kernel-specific counter in /proc/meminfo? Why include NR_SLAB_RECLAIMABLE_B
> > when that's used by caches with shrinkers?
>
> Ok, this is because the current consideration is to specifically
> track the memory usage of the slab module.
> In the stability test, ie, monkey test,
> the anr or reboot problem occurs, there is a high probability
> that the slab occupancy is high when it comes to memory analysis.
> In addition to directly monitoring leaks in the allocation path, it is
> also convenient to record the allocation stack information
> when an exception occurs.

[+Cc Memory Allocation Profiling maintainers]

For recording allocation information, I think CONFIG_MEM_ALLOC_PROFILING [1] [2]
may be used to track allocation sites that contribute to memory leaks,
instead of making the kernel panic or printing WARNING?

.....Or with higher overhead, slub_debug=U [3] if it is not meant to
be run on production.

[1] https://docs.kernel.org/mm/allocation-profiling.html
[2] https://lwn.net/Articles/974380
[3] https://docs.kernel.org/mm/slub.html#debugfs-files-for-slub

Best,
Hyeonggon

> > A userspace solution should be straightforward and universal - easily
> > configurable for different scenarios.
> >
> > >> and a panic operation will be triggered immediately.
> > >
> > > I don't think it would be a good idea to panic unnecessarily.
> > > IMO it is not proper to panic when the kernel can still run.
> >
> > Yes these days it's practically impossible to add a BUG_ON() for more
> > serious conditions than this.
> >
> > Please don't post new versions addressing specific implementation details
> > until this fundamental issue is addressed.
> >
> > Thanks,
> > Vlastimil
> >
> > > Any thoughts?
> > >
> > > Thanks,
> > > Hyeonggon
> >

韩玉明 (Yuming Han) Oct. 9, 2024, 1:25 a.m. UTC | #5

?loop  shuo.tian