Message ID | 20190727132334.9184-1-catalin.marinas@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] mm: kmemleak: Use mempool allocations for kmemleak objects | expand |
On Sat, 27 Jul 2019 14:23:33 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote: > Add mempool allocations for struct kmemleak_object and > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() > under memory pressure. Additionally, mask out all the gfp flags passed > to kmemleak other than GFP_KERNEL|GFP_ATOMIC. > > A boot-time tuning parameter (kmemleak.mempool) is added to allow a > different minimum pool size (defaulting to NR_CPUS * 4). Why would anyone ever want to alter this? Is there some particular misbehaviour which this will improve? If so, what is it? > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -2011,6 +2011,12 @@ > Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, > the default is off. > > + kmemleak.mempool= > + [KNL] Boot-time tuning of the minimum kmemleak > + metadata pool size. > + Format: <int> > + Default: NR_CPUS * 4 > + This is the only documentation we provide people and it doesn't really explain anything at all. IOW, can we do a better job of explaining all this to the target audience? Why does the min size need to be tunable anyway?
On Sat, 27 Jul 2019 14:23:33 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote: > Add mempool allocations for struct kmemleak_object and > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() > under memory pressure. Additionally, mask out all the gfp flags passed > to kmemleak other than GFP_KERNEL|GFP_ATOMIC. > > A boot-time tuning parameter (kmemleak.mempool) is added to allow a > different minimum pool size (defaulting to NR_CPUS * 4). btw, the checkpatch warnings are valid: WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc #70: FILE: mm/kmemleak.c:197: +static int min_object_pool = NR_CPUS * 4; WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc #71: FILE: mm/kmemleak.c:198: +static int min_scan_area_pool = NR_CPUS * 1; There can be situations where NR_CPUS is much larger than num_possible_cpus(). Can we initialize these tunables within kmemleak_init()?
On Tue, 2019-07-30 at 12:57 -0700, Andrew Morton wrote: > On Sat, 27 Jul 2019 14:23:33 +0100 Catalin Marinas <catalin.marinas@arm.com> > wrote: > > > Add mempool allocations for struct kmemleak_object and > > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() > > under memory pressure. Additionally, mask out all the gfp flags passed > > to kmemleak other than GFP_KERNEL|GFP_ATOMIC. > > > > A boot-time tuning parameter (kmemleak.mempool) is added to allow a > > different minimum pool size (defaulting to NR_CPUS * 4). > > Why would anyone ever want to alter this? Is there some particular > misbehaviour which this will improve? If so, what is it? So it can tolerant different systems and workloads. For example, there are some machines with slow disk and fast CPUs. When they are under memory pressure, it could take a long time to swap before the OOM kicks in to free up some memory. As the results, it needs a large mempool for kmemleak or suffering from higher chance of a kmemleak metadata allocation failure. > > > --- a/Documentation/admin-guide/kernel-parameters.txt > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > @@ -2011,6 +2011,12 @@ > > Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, > > the default is off. > > > > + kmemleak.mempool= > > + [KNL] Boot-time tuning of the minimum kmemleak > > + metadata pool size. > > + Format: <int> > > + Default: NR_CPUS * 4 > > + Catalin, BTW, it is right now unable to handle a large size. I tried to reserve 64M (kmemleak.mempool=67108864), [ 0.039254][ T0] WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:4707 __alloc_pages_nodemask+0x3b8/0x1780 [ 0.039284][ T0] Modules linked in: [ 0.039309][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc2-next- 20190730+ #3 [ 0.039328][ T0] NIP: c000000000395038 LR: c0000000003d9320 CTR: 0000000000000000 [ 0.039355][ T0] REGS: c00000000170f710 TRAP: 0700 Not tainted (5.3.0- rc2-next-20190730+) [ 0.039384][ T0] MSR: 9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 24000884 XER: 20040000 [ 0.039431][ T0] CFAR: c000000000394cd4 IRQMASK: 0 [ 0.039431][ T0] GPR00: c0000000003d9320 c00000000170f9a0 c000000001708c00 0000000000040cc0 [ 0.039431][ T0] GPR04: 0000000000000010 0000000000000000 0000000000000000 c000000002aac080 [ 0.039431][ T0] GPR08: 0000001ffb3a0000 0000000000000000 c0000000003d9320 0000000000000000 [ 0.039431][ T0] GPR12: 0000000024000882 c000000002760000 0000000000000000 0000000000000000 [ 0.039431][ T0] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.039431][ T0] GPR20: 0000000000000000 0000000000000001 0000000010004d9c 00000000100053ed [ 0.039431][ T0] GPR24: ffffffffffffffff ffffffffffffffff c0000000002e9544 0000000100000000 [ 0.039431][ T0] GPR28: 0000000000000cc0 0000000100000000 0000000000040cc0 c0000000027e8c48 [ 0.039646][ T0] NIP [c000000000395038] __alloc_pages_nodemask+0x3b8/0x1780 [ 0.039693][ T0] LR [c0000000003d9320] kmalloc_large_node+0x100/0x1a0 [ 0.039727][ T0] Call Trace: [ 0.039749][ T0] [c00000000170f9a0] [0000000000000001] 0x1 (unreliable) [ 0.039776][ T0] [c00000000170fbe0] [0000000000000000] 0x0 [ 0.039795][ T0] [c00000000170fc80] [c0000000003e5080] __kmalloc_node+0x520/0x890 [ 0.039816][ T0] [c00000000170fd20] [c0000000002e9544] mempool_init_node+0xb4/0x1e0 [ 0.039836][ T0] [c00000000170fd80] [c0000000002e975c] mempool_create_node+0xcc/0x150 [ 0.039857][ T0] [c00000000170fdf0] [c000000000b2a730] kmemleak_init+0x16c/0x54c [ 0.039878][ T0] [c00000000170fef0] [c000000000ae460c] start_kernel+0x69c/0x7cc [ 0.039908][ T0] [c00000000170ff90] [c00000000000a7d4] start_here_common+0x1c/0x434 [ 0.039945][ T0] Instruction dump: [ 0.039976][ T0] 4bffff14 e92d0968 39291020 3bc00001 f9210148 4bfffd98 7d435378 4bf94eed [ 0.040012][ T0] 60000000 4bfffdfc 70692000 4082ffd0 <0fe00000> 3bc00000 4bfffedc 39200000 [ 0.040049][ T0] ---[ end trace 038320b411324ff7 ]--- [ 0.040100][ T0] kmemleak: Kernel memory leak detector disabled [ 16.192449][ T1] BUG: Unable to handle kernel data access at 0xffffffffffffb2aa [ 16.192473][ T1] Faulting instruction address: 0xc000000000b2a2fc [ 16.192500][ T1] Oops: Kernel access of bad area, sig: 11 [#1] [ 16.192526][ T1] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA PowerNV [ 16.192567][ T1] Modules linked in: [ 16.192593][ T1] CPU: 4 PID: 1 Comm: swapper/0 Tainted: G W 5.3.0-rc2-next-20190730+ #3 [ 16.192646][ T1] NIP: c000000000b2a2fc LR: c0000000003e6e48 CTR: c0000000000b4380 [ 16.192698][ T1] REGS: c00000002aaef9d0 TRAP: 0380 Tainted: G W (5.3.0-rc2-next-20190730+) [ 16.192750][ T1] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28002884 XER: 20040000 [ 16.192801][ T1] CFAR: c00000000043769c IRQMASK: 0 [ 16.192801][ T1] GPR00: c0000000003e6e48 c00000002aaefc60 c000000001708c00 0000000000000002 [ 16.192801][ T1] GPR04: c000000002c42648 0000000000000000 0000000000000000 ffffffff00001e77 [ 16.192801][ T1] GPR08: 0000000000000000 0000000000000001 0000000000000800 0000000000000000 [ 16.192801][ T1] GPR12: 0000000000002000 c000001fffffbc00 c0000000000103d8 0000000000000000 [ 16.192801][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 16.192801][ T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 16.192801][ T1] GPR24: 0000000000000000 c000000002aa9c80 c0000000018d0730 c0000000003c9270 [ 16.192801][ T1] GPR28: 000000000000b100 c00c00000000b100 c000000002c42648 c000000002aa9c80 [ 16.193126][ T1] NIP [c000000000b2a2fc] log_early+0x8/0x160 [ 16.193153][ T1] LR [c0000000003e6e48] kmem_cache_free+0x428/0x740 [ 16.193190][ T1] Call Trace: [ 16.193213][ T1] [c00000002aaefc60] [0000000000000366] 0x366 (unreliable) [ 16.193243][ T1] [c00000002aaefd00] [c0000000003c9270] __mpol_put+0x50/0x70 [ 16.193272][ T1] [c00000002aaefd20] [c0000000003c9488] do_set_mempolicy+0x108/0x170 [ 16.193314][ T1] [c00000002aaefdb0] [c000000000010434] kernel_init+0x64/0x150 [ 16.193363][ T1] [c00000002aaefe20] [c00000000000b1cc] ret_from_kernel_thread+0x5c/0x70 [ 16.193412][ T1] Instruction dump: [ 16.193436][ T1] aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa [ 16.193486][ T1] aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa <aaaaaaaa> aaaaaaaa aaaaaaaa aaaaaaaa [ 16.193556][ T1] ---[ end trace 038320b411324ff9 ]--- [ 16.587204][ T1] [ 17.587316][ T1] Kernel panic - not syncing: Fatal exception
On Tue, 30 Jul 2019 16:22:37 -0400 Qian Cai <cai@lca.pw> wrote: > On Tue, 2019-07-30 at 12:57 -0700, Andrew Morton wrote: > > On Sat, 27 Jul 2019 14:23:33 +0100 Catalin Marinas <catalin.marinas@arm.com> > > wrote: > > > > > Add mempool allocations for struct kmemleak_object and > > > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() > > > under memory pressure. Additionally, mask out all the gfp flags passed > > > to kmemleak other than GFP_KERNEL|GFP_ATOMIC. > > > > > > A boot-time tuning parameter (kmemleak.mempool) is added to allow a > > > different minimum pool size (defaulting to NR_CPUS * 4). > > > > Why would anyone ever want to alter this? Is there some particular > > misbehaviour which this will improve? If so, what is it? > > So it can tolerant different systems and workloads. For example, there are some > machines with slow disk and fast CPUs. When they are under memory pressure, it > could take a long time to swap before the OOM kicks in to free up some memory. > As the results, it needs a large mempool for kmemleak or suffering from higher > chance of a kmemleak metadata allocation failure. This sort of thing should be in the changelog and in the user-facing documentation please. Also, we should document the user-visible effects of this failure so that users can determine whether this tunable will help them.
On Tue 30-07-19 12:57:43, Andrew Morton wrote: > On Sat, 27 Jul 2019 14:23:33 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote: > > > Add mempool allocations for struct kmemleak_object and > > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() > > under memory pressure. Additionally, mask out all the gfp flags passed > > to kmemleak other than GFP_KERNEL|GFP_ATOMIC. > > > > A boot-time tuning parameter (kmemleak.mempool) is added to allow a > > different minimum pool size (defaulting to NR_CPUS * 4). > > Why would anyone ever want to alter this? Is there some particular > misbehaviour which this will improve? If so, what is it? I do agree with Andrew here. Can we simply go with no tunning for now and only add it based on some real life reports that the auto-tuning is not sufficient?
On Sat 27-07-19 14:23:33, Catalin Marinas wrote: > Add mempool allocations for struct kmemleak_object and > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() > under memory pressure. Additionally, mask out all the gfp flags passed > to kmemleak other than GFP_KERNEL|GFP_ATOMIC. > > A boot-time tuning parameter (kmemleak.mempool) is added to allow a > different minimum pool size (defaulting to NR_CPUS * 4). > > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Qian Cai <cai@lca.pw> > Suggested-by: Michal Hocko <mhocko@kernel.org> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> I am not familiar with the kmemleak code so I cannot really give my ack but I can give my thumbs up at least. This is definitely an improvement and step into the right direction. The gfp flags games were just broken. My only recommendation would be to drop the kernel parameter as mentioned in other email. We have just too many of them and if the current auto-tuning is not sufficient we want to hear about that and find a better one or add a parameter only if we fail. Thanks!
On Wed, Jul 31, 2019 at 11:06:53AM +0200, Michal Hocko wrote: > On Tue 30-07-19 12:57:43, Andrew Morton wrote: > > On Sat, 27 Jul 2019 14:23:33 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > > Add mempool allocations for struct kmemleak_object and > > > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() > > > under memory pressure. Additionally, mask out all the gfp flags passed > > > to kmemleak other than GFP_KERNEL|GFP_ATOMIC. > > > > > > A boot-time tuning parameter (kmemleak.mempool) is added to allow a > > > different minimum pool size (defaulting to NR_CPUS * 4). > > > > Why would anyone ever want to alter this? Is there some particular > > misbehaviour which this will improve? If so, what is it? > > I do agree with Andrew here. Can we simply go with no tunning for now > and only add it based on some real life reports that the auto-tuning is > not sufficient? In a first attempt earlier this year, Qian reported that an emergency pool (subsequently converted to using mempool) with the default pre-fill does not help under memory pressure: https://lore.kernel.org/linux-mm/49f77efc-8375-8fc8-aa89-9814bfbfe5bc@lca.pw/ I'm waiting for him to confirm whether the tunable in this patch helps, otherwise we can look elsewhere, maybe refilling the mempool via other means than just on free. In general, not sure we can do much under memory pressure. I'm looking at adding the kmemleak metadata to the slab itself (though I get some weird -EEXIST error in kobject_add_internal) but there are still places where the metadata needs to be allocated directly and, under OOM, this is prone to failure. I guess we'll have to live with this.
On Tue, Jul 30, 2019 at 04:22:37PM -0400, Qian Cai wrote: > On Tue, 2019-07-30 at 12:57 -0700, Andrew Morton wrote: > > On Sat, 27 Jul 2019 14:23:33 +0100 Catalin Marinas <catalin.marinas@arm.com> > > > --- a/Documentation/admin-guide/kernel-parameters.txt > > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > > @@ -2011,6 +2011,12 @@ > > > Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, > > > the default is off. > > > > > > + kmemleak.mempool= > > > + [KNL] Boot-time tuning of the minimum kmemleak > > > + metadata pool size. > > > + Format: <int> > > > + Default: NR_CPUS * 4 > > > + > > Catalin, BTW, it is right now unable to handle a large size. I tried to reserve > 64M (kmemleak.mempool=67108864), > > [ 0.039254][ T0] WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:4707 __alloc_pages_nodemask+0x3b8/0x1780 [...] > [ 0.039646][ T0] NIP [c000000000395038] __alloc_pages_nodemask+0x3b8/0x1780 > [ 0.039693][ T0] LR [c0000000003d9320] kmalloc_large_node+0x100/0x1a0 > [ 0.039727][ T0] Call Trace: > [ 0.039795][ T0] [c00000000170fc80] [c0000000003e5080] __kmalloc_node+0x520/0x890 > [ 0.039816][ T0] [c00000000170fd20] [c0000000002e9544] mempool_init_node+0xb4/0x1e0 > [ 0.039836][ T0] [c00000000170fd80] [c0000000002e975c] mempool_create_node+0xcc/0x150 > [ 0.039857][ T0] [c00000000170fdf0] [c000000000b2a730] kmemleak_init+0x16c/0x54c > [ 0.039878][ T0] [c00000000170fef0] [c000000000ae460c] start_kernel+0x69c/0x7cc > [ 0.039908][ T0] [c00000000170ff90] [c00000000000a7d4] start_here_common+0x1c/0x434 [...] > [ 0.040100][ T0] kmemleak: Kernel memory leak detector disabled It looks like the mempool cannot be created. 64M objects means a kmalloc(512MB) for the pool array in mempool_init_node(), so that hits the MAX_ORDER warning in __alloc_pages_nodemask(). Maybe the mempool tunable won't help much for your case if you need so many objects. It's still worth having a mempool for kmemleak but we could look into changing the refill logic while keeping the original size constant (say 1024 objects). > [ 16.192449][ T1] BUG: Unable to handle kernel data access at 0xffffffffffffb2aa This doesn't seem kmemleak related from the trace.
> On Jul 31, 2019, at 5:53 AM, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Tue, Jul 30, 2019 at 04:22:37PM -0400, Qian Cai wrote: >> On Tue, 2019-07-30 at 12:57 -0700, Andrew Morton wrote: >>> On Sat, 27 Jul 2019 14:23:33 +0100 Catalin Marinas <catalin.marinas@arm.com> >>>> --- a/Documentation/admin-guide/kernel-parameters.txt >>>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>>> @@ -2011,6 +2011,12 @@ >>>> Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, >>>> the default is off. >>>> >>>> + kmemleak.mempool= >>>> + [KNL] Boot-time tuning of the minimum kmemleak >>>> + metadata pool size. >>>> + Format: <int> >>>> + Default: NR_CPUS * 4 >>>> + >> >> Catalin, BTW, it is right now unable to handle a large size. I tried to reserve >> 64M (kmemleak.mempool=67108864), >> >> [ 0.039254][ T0] WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:4707 __alloc_pages_nodemask+0x3b8/0x1780 > [...] >> [ 0.039646][ T0] NIP [c000000000395038] __alloc_pages_nodemask+0x3b8/0x1780 >> [ 0.039693][ T0] LR [c0000000003d9320] kmalloc_large_node+0x100/0x1a0 >> [ 0.039727][ T0] Call Trace: >> [ 0.039795][ T0] [c00000000170fc80] [c0000000003e5080] __kmalloc_node+0x520/0x890 >> [ 0.039816][ T0] [c00000000170fd20] [c0000000002e9544] mempool_init_node+0xb4/0x1e0 >> [ 0.039836][ T0] [c00000000170fd80] [c0000000002e975c] mempool_create_node+0xcc/0x150 >> [ 0.039857][ T0] [c00000000170fdf0] [c000000000b2a730] kmemleak_init+0x16c/0x54c >> [ 0.039878][ T0] [c00000000170fef0] [c000000000ae460c] start_kernel+0x69c/0x7cc >> [ 0.039908][ T0] [c00000000170ff90] [c00000000000a7d4] start_here_common+0x1c/0x434 > [...] >> [ 0.040100][ T0] kmemleak: Kernel memory leak detector disabled > > It looks like the mempool cannot be created. 64M objects means a > kmalloc(512MB) for the pool array in mempool_init_node(), so that hits > the MAX_ORDER warning in __alloc_pages_nodemask(). > > Maybe the mempool tunable won't help much for your case if you need so > many objects. It's still worth having a mempool for kmemleak but we > could look into changing the refill logic while keeping the original > size constant (say 1024 objects). Actually, kmemleak.mempool=524288 works quite well on systems I have here. This is more of making the code robust by error-handling a large value without the NULL-ptr-deref below. Maybe simply just validate the value by adding upper bound to not trigger that warning with MAX_ORDER. > >> [ 16.192449][ T1] BUG: Unable to handle kernel data access at 0xffffffffffffb2aa > > This doesn't seem kmemleak related from the trace. This only happens when passing a large kmemleak.mempool, e.g., 64M [ 16.193126][ T1] NIP [c000000000b2a2fc] log_early+0x8/0x160 [ 16.193153][ T1] LR [c0000000003e6e48] kmem_cache_free+0x428/0x740 [ 16.193190][ T1] Call Trace: [ 16.193213][ T1] [c00000002aaefc60] [0000000000000366] 0x366 (unreliable) [ 16.193243][ T1] [c00000002aaefd00] [c0000000003c9270] __mpol_put+0x50/0x70 [ 16.193272][ T1] [c00000002aaefd20] [c0000000003c9488] do_set_mempolicy+0x108/0x170 [ 16.193314][ T1] [c00000002aaefdb0] [c000000000010434] kernel_init+0x64/0x150 [ 16.193363][ T1] [c00000002aaefe20] [c00000000000b1cc] ret_from_kernel_thread+0x5c/0x70 # ./scripts/faddr2line vmlinux log_early+0x8/0x160 log_early+0x8/0x160: log_early at mm/kmemleak.c:859
On Wed, Jul 31, 2019 at 08:02:30AM -0400, Qian Cai wrote: > On Jul 31, 2019, at 5:53 AM, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Tue, Jul 30, 2019 at 04:22:37PM -0400, Qian Cai wrote: > >> On Tue, 2019-07-30 at 12:57 -0700, Andrew Morton wrote: > >>> On Sat, 27 Jul 2019 14:23:33 +0100 Catalin Marinas <catalin.marinas@arm.com> > >>>> --- a/Documentation/admin-guide/kernel-parameters.txt > >>>> +++ b/Documentation/admin-guide/kernel-parameters.txt > >>>> @@ -2011,6 +2011,12 @@ > >>>> Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, > >>>> the default is off. > >>>> > >>>> + kmemleak.mempool= > >>>> + [KNL] Boot-time tuning of the minimum kmemleak > >>>> + metadata pool size. > >>>> + Format: <int> > >>>> + Default: NR_CPUS * 4 > >>>> + > >> > >> Catalin, BTW, it is right now unable to handle a large size. I tried to reserve > >> 64M (kmemleak.mempool=67108864), [...] > > It looks like the mempool cannot be created. 64M objects means a > > kmalloc(512MB) for the pool array in mempool_init_node(), so that hits > > the MAX_ORDER warning in __alloc_pages_nodemask(). > > > > Maybe the mempool tunable won't help much for your case if you need so > > many objects. It's still worth having a mempool for kmemleak but we > > could look into changing the refill logic while keeping the original > > size constant (say 1024 objects). > > Actually, kmemleak.mempool=524288 works quite well on systems I have here. This > is more of making the code robust by error-handling a large value without the > NULL-ptr-deref below. Maybe simply just validate the value by adding upper bound > to not trigger that warning with MAX_ORDER. Would it work for you with a Kconfig option, similar to DEBUG_KMEMLEAK_EARLY_LOG_SIZE? > >> [ 16.192449][ T1] BUG: Unable to handle kernel data access at 0xffffffffffffb2aa > > > > This doesn't seem kmemleak related from the trace. > > This only happens when passing a large kmemleak.mempool, e.g., 64M > > [ 16.193126][ T1] NIP [c000000000b2a2fc] log_early+0x8/0x160 > [ 16.193153][ T1] LR [c0000000003e6e48] kmem_cache_free+0x428/0x740 Ah, I missed the log_early() call here. It's a kmemleak bug where it isn't disabled properly in case of an error and log_early() is still called after the .text.init section was freed. I'll send a patch.
On Wed, 2019-07-31 at 15:48 +0100, Catalin Marinas wrote: > On Wed, Jul 31, 2019 at 08:02:30AM -0400, Qian Cai wrote: > > On Jul 31, 2019, at 5:53 AM, Catalin Marinas <catalin.marinas@arm.com> > > wrote: > > > On Tue, Jul 30, 2019 at 04:22:37PM -0400, Qian Cai wrote: > > > > On Tue, 2019-07-30 at 12:57 -0700, Andrew Morton wrote: > > > > > On Sat, 27 Jul 2019 14:23:33 +0100 Catalin Marinas <catalin.marinas@ar > > > > > m.com> > > > > > > --- a/Documentation/admin-guide/kernel-parameters.txt > > > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > > > > > @@ -2011,6 +2011,12 @@ > > > > > > Built with > > > > > > CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, > > > > > > the default is off. > > > > > > > > > > > > + kmemleak.mempool= > > > > > > + [KNL] Boot-time tuning of the minimum > > > > > > kmemleak > > > > > > + metadata pool size. > > > > > > + Format: <int> > > > > > > + Default: NR_CPUS * 4 > > > > > > + > > > > > > > > Catalin, BTW, it is right now unable to handle a large size. I tried to > > > > reserve > > > > 64M (kmemleak.mempool=67108864), > > [...] > > > It looks like the mempool cannot be created. 64M objects means a > > > kmalloc(512MB) for the pool array in mempool_init_node(), so that hits > > > the MAX_ORDER warning in __alloc_pages_nodemask(). > > > > > > Maybe the mempool tunable won't help much for your case if you need so > > > many objects. It's still worth having a mempool for kmemleak but we > > > could look into changing the refill logic while keeping the original > > > size constant (say 1024 objects). > > > > Actually, kmemleak.mempool=524288 works quite well on systems I have here. > > This > > is more of making the code robust by error-handling a large value without > > the > > NULL-ptr-deref below. Maybe simply just validate the value by adding upper > > bound > > to not trigger that warning with MAX_ORDER. > > Would it work for you with a Kconfig option, similar to > DEBUG_KMEMLEAK_EARLY_LOG_SIZE? Yes, it should be fine. > > > > > [ 16.192449][ T1] BUG: Unable to handle kernel data access at > > > > 0xffffffffffffb2aa > > > > > > This doesn't seem kmemleak related from the trace. > > > > This only happens when passing a large kmemleak.mempool, e.g., 64M > > > > [ 16.193126][ T1] NIP [c000000000b2a2fc] log_early+0x8/0x160 > > [ 16.193153][ T1] LR [c0000000003e6e48] kmem_cache_free+0x428/0x740 > > Ah, I missed the log_early() call here. It's a kmemleak bug where it > isn't disabled properly in case of an error and log_early() is still > called after the .text.init section was freed. I'll send a patch. >
On Tue, Jul 30, 2019 at 01:02:15PM -0700, Andrew Morton wrote: > On Sat, 27 Jul 2019 14:23:33 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote: > > > Add mempool allocations for struct kmemleak_object and > > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() > > under memory pressure. Additionally, mask out all the gfp flags passed > > to kmemleak other than GFP_KERNEL|GFP_ATOMIC. > > > > A boot-time tuning parameter (kmemleak.mempool) is added to allow a > > different minimum pool size (defaulting to NR_CPUS * 4). > > btw, the checkpatch warnings are valid: > > WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc > #70: FILE: mm/kmemleak.c:197: > +static int min_object_pool = NR_CPUS * 4; > > WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc > #71: FILE: mm/kmemleak.c:198: > +static int min_scan_area_pool = NR_CPUS * 1; > > There can be situations where NR_CPUS is much larger than > num_possible_cpus(). Can we initialize these tunables within > kmemleak_init()? We could and, at least on arm64, cpu_possible_mask is already initialised at that point. However, that's a totally made up number. I think we would better go for a Kconfig option (defaulting to, say, 1024) similar to the CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE and we grow it if people report better values in the future.
On Wed 31-07-19 16:44:50, Catalin Marinas wrote: > On Tue, Jul 30, 2019 at 01:02:15PM -0700, Andrew Morton wrote: > > On Sat, 27 Jul 2019 14:23:33 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > > Add mempool allocations for struct kmemleak_object and > > > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() > > > under memory pressure. Additionally, mask out all the gfp flags passed > > > to kmemleak other than GFP_KERNEL|GFP_ATOMIC. > > > > > > A boot-time tuning parameter (kmemleak.mempool) is added to allow a > > > different minimum pool size (defaulting to NR_CPUS * 4). > > > > btw, the checkpatch warnings are valid: > > > > WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc > > #70: FILE: mm/kmemleak.c:197: > > +static int min_object_pool = NR_CPUS * 4; > > > > WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc > > #71: FILE: mm/kmemleak.c:198: > > +static int min_scan_area_pool = NR_CPUS * 1; > > > > There can be situations where NR_CPUS is much larger than > > num_possible_cpus(). Can we initialize these tunables within > > kmemleak_init()? > > We could and, at least on arm64, cpu_possible_mask is already > initialised at that point. However, that's a totally made up number. I > think we would better go for a Kconfig option (defaulting to, say, 1024) > similar to the CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE and we grow it if > people report better values in the future. If you really want/need to make this configurable then the command line parameter makes more sense - think of distribution kernel users for example. But I am still not sure why this is really needed. The initial size is a "made up" number of course. There is no good estimation to make (without a crystal ball). The value might be increased based on real life usage.
On Thu, Aug 01, 2019 at 08:41:53AM +0200, Michal Hocko wrote: > On Wed 31-07-19 16:44:50, Catalin Marinas wrote: > > On Tue, Jul 30, 2019 at 01:02:15PM -0700, Andrew Morton wrote: > > > On Sat, 27 Jul 2019 14:23:33 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > Add mempool allocations for struct kmemleak_object and > > > > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() > > > > under memory pressure. Additionally, mask out all the gfp flags passed > > > > to kmemleak other than GFP_KERNEL|GFP_ATOMIC. > > > > > > > > A boot-time tuning parameter (kmemleak.mempool) is added to allow a > > > > different minimum pool size (defaulting to NR_CPUS * 4). > > > > > > btw, the checkpatch warnings are valid: > > > > > > WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc > > > #70: FILE: mm/kmemleak.c:197: > > > +static int min_object_pool = NR_CPUS * 4; > > > > > > WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc > > > #71: FILE: mm/kmemleak.c:198: > > > +static int min_scan_area_pool = NR_CPUS * 1; > > > > > > There can be situations where NR_CPUS is much larger than > > > num_possible_cpus(). Can we initialize these tunables within > > > kmemleak_init()? > > > > We could and, at least on arm64, cpu_possible_mask is already > > initialised at that point. However, that's a totally made up number. I > > think we would better go for a Kconfig option (defaulting to, say, 1024) > > similar to the CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE and we grow it if > > people report better values in the future. > > If you really want/need to make this configurable then the command line > parameter makes more sense - think of distribution kernel users for > example. I doubt you'd have pre-built distribution kernels with kmemleak enabled. > But I am still not sure why this is really needed. The initial > size is a "made up" number of course. There is no good estimation to > make (without a crystal ball). The value might be increased based on > real life usage. We had a similar situation with the early log buffer (before slab is initialised), initially 400 which was good enough for my needs (embedded systems) but others had entirely different requirements. A configurable (cmdline, Kconfig) option would make it easier for people to change, especially if coupled with a meaningful suggestion in dmesg. Another option is to use the early log as an emergency pool after initialisation instead of freeing it (it's currently __initdata) and drop the mempool idea. I may give this a go, at least we only have a single Kconfig option.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 46b826fcb5ad..11c413e3c42b 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2011,6 +2011,12 @@ Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, the default is off. + kmemleak.mempool= + [KNL] Boot-time tuning of the minimum kmemleak + metadata pool size. + Format: <int> + Default: NR_CPUS * 4 + kprobe_event=[probe-list] [FTRACE] Add kprobe events and enable at boot time. The probe-list is a semicolon delimited list of probe diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 6e9e8cca663e..a31eab79bcf5 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -69,6 +69,7 @@ #include <linux/kthread.h> #include <linux/rbtree.h> #include <linux/fs.h> +#include <linux/mempool.h> #include <linux/debugfs.h> #include <linux/seq_file.h> #include <linux/cpumask.h> @@ -112,9 +113,7 @@ #define BYTES_PER_POINTER sizeof(void *) /* GFP bitmask for kmemleak internal allocations */ -#define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \ - __GFP_NORETRY | __GFP_NOMEMALLOC | \ - __GFP_NOWARN) +#define gfp_kmemleak_mask(gfp) ((gfp) & (GFP_KERNEL | GFP_ATOMIC)) /* scanning area inside a memory block */ struct kmemleak_scan_area { @@ -190,7 +189,13 @@ static DEFINE_RWLOCK(kmemleak_lock); /* allocation caches for kmemleak internal data */ static struct kmem_cache *object_cache; +static mempool_t *object_mempool; static struct kmem_cache *scan_area_cache; +static mempool_t *scan_area_mempool; + +/* default minimum memory pool sizes */ +static int min_object_pool = NR_CPUS * 4; +static int min_scan_area_pool = NR_CPUS * 1; /* set if tracing memory operations is enabled */ static int kmemleak_enabled; @@ -465,9 +470,9 @@ static void free_object_rcu(struct rcu_head *rcu) */ hlist_for_each_entry_safe(area, tmp, &object->area_list, node) { hlist_del(&area->node); - kmem_cache_free(scan_area_cache, area); + mempool_free(area, scan_area_mempool); } - kmem_cache_free(object_cache, object); + mempool_free(object, object_mempool); } /* @@ -550,7 +555,7 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, struct rb_node **link, *rb_parent; unsigned long untagged_ptr; - object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp)); + object = mempool_alloc(object_mempool, gfp_kmemleak_mask(gfp)); if (!object) { pr_warn("Cannot allocate a kmemleak_object structure\n"); kmemleak_disable(); @@ -614,7 +619,7 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, * be freed while the kmemleak_lock is held. */ dump_object_info(parent); - kmem_cache_free(object_cache, object); + mempool_free(object, object_mempool); object = NULL; goto out; } @@ -772,7 +777,7 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp) return; } - area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp)); + area = mempool_alloc(scan_area_mempool, gfp_kmemleak_mask(gfp)); if (!area) { pr_warn("Cannot allocate a scan area\n"); goto out; @@ -784,7 +789,7 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp) } else if (ptr + size > object->pointer + object->size) { kmemleak_warn("Scan area larger than object 0x%08lx\n", ptr); dump_object_info(object); - kmem_cache_free(scan_area_cache, area); + mempool_free(area, scan_area_mempool); goto out_unlock; } @@ -1993,6 +1998,27 @@ static int __init kmemleak_boot_config(char *str) } early_param("kmemleak", kmemleak_boot_config); +/* + * Allow boot-time tuning of the kmemleak mempool size. + */ +static int __init kmemleak_mempool_config(char *str) +{ + int size, ret; + + if (!str) + return -EINVAL; + + ret = kstrtoint(str, 0, &size); + if (ret) + return ret; + + min_object_pool = size; + min_scan_area_pool = size / 4; + + return 0; +} +early_param("kmemleak.mempool", kmemleak_mempool_config); + static void __init print_log_trace(struct early_log *log) { pr_notice("Early log backtrace:\n"); @@ -2020,6 +2046,18 @@ void __init kmemleak_init(void) object_cache = KMEM_CACHE(kmemleak_object, SLAB_NOLEAKTRACE); scan_area_cache = KMEM_CACHE(kmemleak_scan_area, SLAB_NOLEAKTRACE); + if (!object_cache || !scan_area_cache) { + kmemleak_disable(); + return; + } + object_mempool = mempool_create_slab_pool(min_object_pool, + object_cache); + scan_area_mempool = mempool_create_slab_pool(min_scan_area_pool, + scan_area_cache); + if (!object_mempool || !scan_area_mempool) { + kmemleak_disable(); + return; + } if (crt_early_log > ARRAY_SIZE(early_log)) pr_warn("Early log buffer exceeded (%d), please increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE\n", @@ -2126,7 +2164,7 @@ static int __init kmemleak_late_init(void) mutex_unlock(&scan_mutex); } - pr_info("Kernel memory leak detector initialized\n"); + pr_info("Kernel memory leak detector initialized (mempool size: %d)\n", min_object_pool); return 0; }
Add mempool allocations for struct kmemleak_object and kmemleak_scan_area as slightly more resilient than kmem_cache_alloc() under memory pressure. Additionally, mask out all the gfp flags passed to kmemleak other than GFP_KERNEL|GFP_ATOMIC. A boot-time tuning parameter (kmemleak.mempool) is added to allow a different minimum pool size (defaulting to NR_CPUS * 4). Cc: Michal Hocko <mhocko@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Qian Cai <cai@lca.pw> Suggested-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- v1 posted here: http://lkml.kernel.org/r/20190328145917.GC10283@arrakis.emea.arm.com Changes in v2: - kmemleak.mempool cmdline parameter to configure the minimum pool size - rebased against -next (on top of the __GFP_NOFAIL revert) .../admin-guide/kernel-parameters.txt | 6 ++ mm/kmemleak.c | 58 +++++++++++++++---- 2 files changed, 54 insertions(+), 10 deletions(-)