Message ID | 20191028194906.26899-1-hannes@cmpxchg.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: rate-limit allocation failure warnings more aggressively | expand |
On Mon, 28 Oct 2019, Johannes Weiner wrote: > While investigating a bug related to higher atomic allocation > failures, we noticed the failure warnings positively drowning the > console, and in our case trigger lockup warnings because of a serial > console too slow to handle all that output. > > But even if we had a faster console, it's unclear what additional > information the current level of repetition provides. > > Allocation failures happen for three reasons: The machine is OOM, the > VM is failing to handle reasonable requests, or somebody is making > unreasonable requests (and didn't acknowledge their opportunism with > __GFP_NOWARN). Having the memory dump, a callstack, and the ratelimit > stats on skipped failure warnings should provide enough information to > let users/admins/developers know whether something is wrong and point > them in the right direction for debugging, bpftracing etc. > > Limit allocation failure warnings to 1 spew every ten seconds. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Rientjes <rientjes@google.com> It feels like the vmalloc warnings should be treated with their own ratelimit (pass a struct ratelimit_state * to warn_alloc()) but that's outside the scope of this particular change.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 791c018314b3..f412b17b5d59 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3720,10 +3720,6 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, static void warn_alloc_show_mem(gfp_t gfp_mask, nodemask_t *nodemask) { unsigned int filter = SHOW_MEM_FILTER_NODES; - static DEFINE_RATELIMIT_STATE(show_mem_rs, HZ, 1); - - if (!__ratelimit(&show_mem_rs)) - return; /* * This documents exceptions given to allocations in certain @@ -3744,8 +3740,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) { struct va_format vaf; va_list args; - static DEFINE_RATELIMIT_STATE(nopage_rs, DEFAULT_RATELIMIT_INTERVAL, - DEFAULT_RATELIMIT_BURST); + static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1); if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs)) return;
While investigating a bug related to higher atomic allocation failures, we noticed the failure warnings positively drowning the console, and in our case trigger lockup warnings because of a serial console too slow to handle all that output. But even if we had a faster console, it's unclear what additional information the current level of repetition provides. Allocation failures happen for three reasons: The machine is OOM, the VM is failing to handle reasonable requests, or somebody is making unreasonable requests (and didn't acknowledge their opportunism with __GFP_NOWARN). Having the memory dump, a callstack, and the ratelimit stats on skipped failure warnings should provide enough information to let users/admins/developers know whether something is wrong and point them in the right direction for debugging, bpftracing etc. Limit allocation failure warnings to 1 spew every ten seconds. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- mm/page_alloc.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)