Message ID | 20240925032256.1782-1-fangzheng.zhang@unisoc.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce panic function when slub leaks | expand |
On Wed, Sep 25, 2024 at 12:23 PM Fangzheng Zhang <fangzheng.zhang@unisoc.com> wrote: > > Hi all, Hi Fangzheng, > A method to detect slub leaks by monitoring its usage in real time > on the page allocation path of the slub. When the slub occupancy > exceeds the user-set value, it is considered that the slub is leaking > at this time I'm not sure why this should be a kernel feature. Why not write a user script that parses MemTotal: and Slab: part of /proc/meminfo file and generates a log entry or an alarm? > and a panic operation will be triggered immediately. I don't think it would be a good idea to panic unnecessarily. IMO it is not proper to panic when the kernel can still run. Any thoughts? Thanks, Hyeonggon
On 9/25/24 15:18, Hyeonggon Yoo wrote: > On Wed, Sep 25, 2024 at 12:23 PM Fangzheng Zhang > <fangzheng.zhang@unisoc.com> wrote: >> >> Hi all, > > Hi Fangzheng, > >> A method to detect slub leaks by monitoring its usage in real time >> on the page allocation path of the slub. When the slub occupancy >> exceeds the user-set value, it is considered that the slub is leaking >> at this time > > I'm not sure why this should be a kernel feature. Why not write a user > script that parses > MemTotal: and Slab: part of /proc/meminfo file and generates a log > entry or an alarm? Yes very much agreed. It seems rather arbitrary. Why slab, why not any other kernel-specific counter in /proc/meminfo? Why include NR_SLAB_RECLAIMABLE_B when that's used by caches with shrinkers? A userspace solution should be straightforward and universal - easily configurable for different scenarios. >> and a panic operation will be triggered immediately. > > I don't think it would be a good idea to panic unnecessarily. > IMO it is not proper to panic when the kernel can still run. Yes these days it's practically impossible to add a BUG_ON() for more serious conditions than this. Please don't post new versions addressing specific implementation details until this fundamental issue is addressed. Thanks, Vlastimil > Any thoughts? > > Thanks, > Hyeonggon
On Thu, Sep 26, 2024 at 8:30 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 9/25/24 15:18, Hyeonggon Yoo wrote: > > On Wed, Sep 25, 2024 at 12:23 PM Fangzheng Zhang > > <fangzheng.zhang@unisoc.com> wrote: > >> > >> Hi all, > > > > Hi Fangzheng, > > > >> A method to detect slub leaks by monitoring its usage in real time > >> on the page allocation path of the slub. When the slub occupancy > >> exceeds the user-set value, it is considered that the slub is leaking > >> at this time > > > > I'm not sure why this should be a kernel feature. Why not write a user > > script that parses > > MemTotal: and Slab: part of /proc/meminfo file and generates a log > > entry or an alarm? > > Yes very much agreed. It seems rather arbitrary. Why slab, why not any other > kernel-specific counter in /proc/meminfo? Why include NR_SLAB_RECLAIMABLE_B > when that's used by caches with shrinkers? Ok, this is because the current consideration is to specifically track the memory usage of the slab module. In the stability test, ie, monkey test, the anr or reboot problem occurs, there is a high probability that the slab occupancy is high when it comes to memory analysis. In addition to directly monitoring leaks in the allocation path, it is also convenient to record the allocation stack information when an exception occurs. > A userspace solution should be straightforward and universal - easily > configurable for different scenarios. > > >> and a panic operation will be triggered immediately. > > > > I don't think it would be a good idea to panic unnecessarily. > > IMO it is not proper to panic when the kernel can still run. > > Yes these days it's practically impossible to add a BUG_ON() for more > serious conditions than this. > > Please don't post new versions addressing specific implementation details > until this fundamental issue is addressed. > > Thanks, > Vlastimil > > > Any thoughts? > > > > Thanks, > > Hyeonggon >
On Fri, Sep 27, 2024 at 4:28 PM zhang fangzheng <fangzheng.zhang1003@gmail.com> wrote: > > On Thu, Sep 26, 2024 at 8:30 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > > > On 9/25/24 15:18, Hyeonggon Yoo wrote: > > > On Wed, Sep 25, 2024 at 12:23 PM Fangzheng Zhang > > > <fangzheng.zhang@unisoc.com> wrote: > > >> > > >> Hi all, > > > > > > Hi Fangzheng, > > > > > >> A method to detect slub leaks by monitoring its usage in real time > > >> on the page allocation path of the slub. When the slub occupancy > > >> exceeds the user-set value, it is considered that the slub is leaking > > >> at this time > > > > > > I'm not sure why this should be a kernel feature. Why not write a user > > > script that parses > > > MemTotal: and Slab: part of /proc/meminfo file and generates a log > > > entry or an alarm? > > > > Yes very much agreed. It seems rather arbitrary. Why slab, why not any other > > kernel-specific counter in /proc/meminfo? Why include NR_SLAB_RECLAIMABLE_B > > when that's used by caches with shrinkers? > > Ok, this is because the current consideration is to specifically > track the memory usage of the slab module. > In the stability test, ie, monkey test, > the anr or reboot problem occurs, there is a high probability > that the slab occupancy is high when it comes to memory analysis. > In addition to directly monitoring leaks in the allocation path, it is > also convenient to record the allocation stack information > when an exception occurs. [+Cc Memory Allocation Profiling maintainers] For recording allocation information, I think CONFIG_MEM_ALLOC_PROFILING [1] [2] may be used to track allocation sites that contribute to memory leaks, instead of making the kernel panic or printing WARNING? .....Or with higher overhead, slub_debug=U [3] if it is not meant to be run on production. [1] https://docs.kernel.org/mm/allocation-profiling.html [2] https://lwn.net/Articles/974380 [3] https://docs.kernel.org/mm/slub.html#debugfs-files-for-slub Best, Hyeonggon > > A userspace solution should be straightforward and universal - easily > > configurable for different scenarios. > > > > >> and a panic operation will be triggered immediately. > > > > > > I don't think it would be a good idea to panic unnecessarily. > > > IMO it is not proper to panic when the kernel can still run. > > > > Yes these days it's practically impossible to add a BUG_ON() for more > > serious conditions than this. > > > > Please don't post new versions addressing specific implementation details > > until this fundamental issue is addressed. > > > > Thanks, > > Vlastimil > > > > > Any thoughts? > > > > > > Thanks, > > > Hyeonggon > >
?loop shuo.tian