Message ID | 20210519201743.3260890-1-atomlin@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt | expand |
On Wed, 19 May 2021 21:17:43 +0100 Aaron Tomlin <atomlin@redhat.com> wrote: > It does not make sense to retry compaction when a fatal signal is > pending. Well, it might make sense. Presumably it is beneficial to other tasks. > In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be > returned; albeit, not every zone, on the zone list, would be considered > in the case a fatal signal is found to be pending. > Yet, in should_compact_retry(), given the last known compaction result, > each zone, on the zone list, can be considered/or checked > (see compaction_zonelist_suitable()). For example, if a zone was found > to succeed, then reclaim/compaction would be tried again > (notwithstanding the above). > > This patch ensures that compaction is not needlessly retried > irrespective of the last known compaction result e.g. if it was skipped, > in the unlikely case a fatal signal is found pending. > So, OOM is at least attempted. What observed problems motivated this change? What were the observed runtime effects of this change?
On 5/20/21 6:34 AM, Andrew Morton wrote: > On Wed, 19 May 2021 21:17:43 +0100 Aaron Tomlin <atomlin@redhat.com> wrote: > >> It does not make sense to retry compaction when a fatal signal is >> pending. > > Well, it might make sense. Presumably it is beneficial to other tasks. Yeah but the compaction won't happen. compact_zone() will immediately detect it via __compact_finished() and bail out. So in that sense it does not make sense to retry :) >> In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be >> returned; albeit, not every zone, on the zone list, would be considered >> in the case a fatal signal is found to be pending. >> Yet, in should_compact_retry(), given the last known compaction result, >> each zone, on the zone list, can be considered/or checked >> (see compaction_zonelist_suitable()). For example, if a zone was found >> to succeed, then reclaim/compaction would be tried again >> (notwithstanding the above). >> >> This patch ensures that compaction is not needlessly retried >> irrespective of the last known compaction result e.g. if it was skipped, >> in the unlikely case a fatal signal is found pending. >> So, OOM is at least attempted. > > What observed problems motivated this change? > > What were the observed runtime effects of this change? Yep those details from the previous thread should be included here.
On Wed, May 19, 2021 at 09:34:55PM -0700, Andrew Morton wrote: > On Wed, 19 May 2021 21:17:43 +0100 Aaron Tomlin <atomlin@redhat.com> wrote: > > > It does not make sense to retry compaction when a fatal signal is > > pending. > > Well, it might make sense. Presumably it is beneficial to other tasks. Apart from Vlastimil's point, if I hit ^C, I want the task to die, as soon as possible. I don't want it to do things which are beneficial to other tasks, I want my shell prompt back, not spending seconds trying to compact memory. Some other task can do that if _it_ needs large contiguous chunks.
On Thu 2021-05-20 12:20 +0200, Vlastimil Babka wrote: > On 5/20/21 6:34 AM, Andrew Morton wrote: > > > > What observed problems motivated this change? > > > > What were the observed runtime effects of this change? > > Yep those details from the previous thread should be included here. Fair enough. During kernel crash dump/or vmcore analysis: I discovered in the context of __alloc_pages_slowpath() the value stored in the no_progress_loops variable was found to be 31,611,688 i.e. well above MAX_RECLAIM_RETRIES; and a fatal signal was pending against current. #6 [ffff00002e78f7c0] do_try_to_free_pages+0xe4 at ffff00001028bd24 #7 [ffff00002e78f840] try_to_free_pages+0xe4 at ffff00001028c0f4 #8 [ffff00002e78f900] __alloc_pages_nodemask+0x500 at ffff0000102cd130 // w28 = *(sp + 148) /* no_progress_loops */ 0xffff0000102cd1e0 <__alloc_pages_nodemask+0x5b0>: ldr w0, [sp,#148] // w0 = w0 + 0x1 0xffff0000102cd1e4 <__alloc_pages_nodemask+0x5b4>: add w0, w0, #0x1 // *(sp + 148) = w0 0xffff0000102cd1e8 <__alloc_pages_nodemask+0x5b8>: str w0, [sp,#148] // if (w0 >= 0x10) // goto __alloc_pages_nodemask+0x904 0xffff0000102cd1ec <__alloc_pages_nodemask+0x5bc>: cmp w0, #0x10 0xffff0000102cd1f0 <__alloc_pages_nodemask+0x5c0>: b.gt 0xffff0000102cd534 - The stack pointer was 0xffff00002e78f900 crash> p *(int *)(0xffff00002e78f900+148) $1 = 31611688 crash> ps 521171 PID PPID CPU TASK ST %MEM VSZ RSS COMM > 521171 1 36 ffff8080e2128800 RU 0.0 34789440 18624 special crash> p &((struct task_struct *)0xffff8080e2128800)->signal.shared_pending $2 = (struct sigpending *) 0xffff80809a416e40 crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0] $3 = 0x804100 crash> sig -s 0x804100 SIGKILL SIGTERM SIGXCPU crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0] & 1U << (9 - 1) $4 = 0x100 Unfortunately, this incident was not reproduced, to date. Kind regards,
On Thu, May 20, 2021 at 12:42:57PM +0100, Aaron Tomlin wrote: > On Thu 2021-05-20 12:20 +0200, Vlastimil Babka wrote: > > On 5/20/21 6:34 AM, Andrew Morton wrote: > > > > > > What observed problems motivated this change? > > > > > > What were the observed runtime effects of this change? > > > > Yep those details from the previous thread should be included here. > > Fair enough. > > During kernel crash dump/or vmcore analysis: I discovered in the context of > __alloc_pages_slowpath() the value stored in the no_progress_loops variable > was found to be 31,611,688 i.e. well above MAX_RECLAIM_RETRIES; and a fatal > signal was pending against current. While this is true, it's not really answering Andrew's question. What we want as part of the commit message is something like: "A customer experienced a low memory situation and sent their task a fatal signal. Instead of dying promptly, it looped in the page allocator failing to make progress because ..." > > #6 [ffff00002e78f7c0] do_try_to_free_pages+0xe4 at ffff00001028bd24 > #7 [ffff00002e78f840] try_to_free_pages+0xe4 at ffff00001028c0f4 > #8 [ffff00002e78f900] __alloc_pages_nodemask+0x500 at ffff0000102cd130 > > // w28 = *(sp + 148) /* no_progress_loops */ > 0xffff0000102cd1e0 <__alloc_pages_nodemask+0x5b0>: ldr w0, [sp,#148] > // w0 = w0 + 0x1 > 0xffff0000102cd1e4 <__alloc_pages_nodemask+0x5b4>: add w0, w0, #0x1 > // *(sp + 148) = w0 > 0xffff0000102cd1e8 <__alloc_pages_nodemask+0x5b8>: str w0, [sp,#148] > // if (w0 >= 0x10) > // goto __alloc_pages_nodemask+0x904 > 0xffff0000102cd1ec <__alloc_pages_nodemask+0x5bc>: cmp w0, #0x10 > 0xffff0000102cd1f0 <__alloc_pages_nodemask+0x5c0>: b.gt 0xffff0000102cd534 > > - The stack pointer was 0xffff00002e78f900 > > crash> p *(int *)(0xffff00002e78f900+148) > $1 = 31611688 > > crash> ps 521171 > PID PPID CPU TASK ST %MEM VSZ RSS COMM > > 521171 1 36 ffff8080e2128800 RU 0.0 34789440 18624 special > > crash> p &((struct task_struct *)0xffff8080e2128800)->signal.shared_pending > $2 = (struct sigpending *) 0xffff80809a416e40 > > crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0] > $3 = 0x804100 > > crash> sig -s 0x804100 > SIGKILL SIGTERM SIGXCPU > > crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0] & 1U << (9 - 1) > $4 = 0x100 > > > Unfortunately, this incident was not reproduced, to date. > > > > > > Kind regards, > > -- > Aaron Tomlin
Matthew, On Thu 2021-05-20 12:56 +0100, Matthew Wilcox wrote: > While this is true, it's not really answering Andrew's question. > What we want as part of the commit message is something like: > > "A customer experienced a low memory situation and sent their task a > fatal signal. Instead of dying promptly, it looped in the page > allocator failing to make progress because ..." Apologies, I did not understand - I will follow up with a v4. Kind regards,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index aaa1655cf682..b317057ac186 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4252,6 +4252,9 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags, if (!order) return false; + if (fatal_signal_pending(current)) + return false; + if (compaction_made_progress(compact_result)) (*compaction_retries)++;
It does not make sense to retry compaction when a fatal signal is pending. In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be returned; albeit, not every zone, on the zone list, would be considered in the case a fatal signal is found to be pending. Yet, in should_compact_retry(), given the last known compaction result, each zone, on the zone list, can be considered/or checked (see compaction_zonelist_suitable()). For example, if a zone was found to succeed, then reclaim/compaction would be tried again (notwithstanding the above). This patch ensures that compaction is not needlessly retried irrespective of the last known compaction result e.g. if it was skipped, in the unlikely case a fatal signal is found pending. So, OOM is at least attempted. Signed-off-by: Aaron Tomlin <atomlin@redhat.com> --- mm/page_alloc.c | 3 +++ 1 file changed, 3 insertions(+)