Message ID | 20190107143802.16847-1-mhocko@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | oom, memcg: do not report racy no-eligible OOM | expand |
Tetsuo, can you confirm that these two patches are fixing the issue you have reported please?
On 2019/01/09 20:03, Michal Hocko wrote: > Tetsuo, > can you confirm that these two patches are fixing the issue you have > reported please? > My patch fixes the issue better than your "[PATCH 2/2] memcg: do not report racy no-eligible OOM tasks" does. You can post "[PATCH 1/2] mm, oom: marks all killed tasks as oom victims" based on a report that we needlessly select more OOM victims because MMF_OOM_SKIP is quickly set by the OOM reaper. In fact, updating oom_reap_task() / exit_mmap() to use mutex_lock(&oom_lock); set_bit(MMF_OOM_SKIP, &mm->flags); mutex_unlock(&oom_lock); will mostly close the race as well.
On Wed 09-01-19 20:34:46, Tetsuo Handa wrote: > On 2019/01/09 20:03, Michal Hocko wrote: > > Tetsuo, > > can you confirm that these two patches are fixing the issue you have > > reported please? > > > > My patch fixes the issue better than your "[PATCH 2/2] memcg: do not > report racy no-eligible OOM tasks" does. OK, so we are stuck again. Hooray!
Michal Hocko wrote: > On Wed 09-01-19 20:34:46, Tetsuo Handa wrote: > > On 2019/01/09 20:03, Michal Hocko wrote: > > > Tetsuo, > > > can you confirm that these two patches are fixing the issue you have > > > reported please? > > > > > > > My patch fixes the issue better than your "[PATCH 2/2] memcg: do not > > report racy no-eligible OOM tasks" does. > > OK, so we are stuck again. Hooray! Andrew, will you pick up "[PATCH 3/2] memcg: Facilitate termination of memcg OOM victims." ? Since mm-oom-marks-all-killed-tasks-as-oom-victims.patch does not call mark_oom_victim() when task_will_free_mem() == true, memcg-do-not-report-racy-no-eligible-oom-tasks.patch does not close the race whereas my patch closes the race better.
On 2019/01/11 8:59, Tetsuo Handa wrote: > Michal Hocko wrote: >> On Wed 09-01-19 20:34:46, Tetsuo Handa wrote: >>> On 2019/01/09 20:03, Michal Hocko wrote: >>>> Tetsuo, >>>> can you confirm that these two patches are fixing the issue you have >>>> reported please? >>>> >>> >>> My patch fixes the issue better than your "[PATCH 2/2] memcg: do not >>> report racy no-eligible OOM tasks" does. >> >> OK, so we are stuck again. Hooray! > > Andrew, will you pick up "[PATCH 3/2] memcg: Facilitate termination of memcg OOM victims." ? > Since mm-oom-marks-all-killed-tasks-as-oom-victims.patch does not call mark_oom_victim() > when task_will_free_mem() == true, memcg-do-not-report-racy-no-eligible-oom-tasks.patch > does not close the race whereas my patch closes the race better. > I confirmed that mm-oom-marks-all-killed-tasks-as-oom-victims.patch and memcg-do-not-report-racy-no-eligible-oom-tasks.patch are completely failing to fix the issue I am reporting. :-( Reproducer: ---------- #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <sched.h> #include <sys/mman.h> #define NUMTHREADS 256 #define MMAPSIZE 4 * 10485760 #define STACKSIZE 4096 static int pipe_fd[2] = { EOF, EOF }; static int memory_eater(void *unused) { int fd = open("/dev/zero", O_RDONLY); char *buf = mmap(NULL, MMAPSIZE, PROT_WRITE | PROT_READ, MAP_ANONYMOUS | MAP_SHARED, EOF, 0); read(pipe_fd[0], buf, 1); read(fd, buf, MMAPSIZE); pause(); return 0; } int main(int argc, char *argv[]) { int i; char *stack; FILE *fp; const unsigned long size = 1048576UL * 200; mkdir("/sys/fs/cgroup/memory/test1", 0755); fp = fopen("/sys/fs/cgroup/memory/test1/memory.limit_in_bytes", "w"); fprintf(fp, "%lu\n", size); fclose(fp); fp = fopen("/sys/fs/cgroup/memory/test1/tasks", "w"); fprintf(fp, "%u\n", getpid()); fclose(fp); if (setgid(-2) || setuid(-2) || pipe(pipe_fd)) return 1; stack = mmap(NULL, STACKSIZE * NUMTHREADS, PROT_WRITE | PROT_READ, MAP_ANONYMOUS | MAP_SHARED, EOF, 0); for (i = 0; i < NUMTHREADS; i++) if (clone(memory_eater, stack + (i + 1) * STACKSIZE, CLONE_VM | CLONE_FS | CLONE_FILES, NULL) == -1) break; close(pipe_fd[1]); pause(); // Manually enter Ctrl-C immediately after dump_header() started. return 0; } ---------- Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20190111.txt.xz : ---------- [ 71.146532][ T9694] a.out invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [ 71.151647][ T9694] CPU: 1 PID: 9694 Comm: a.out Kdump: loaded Not tainted 5.0.0-rc1-next-20190111 #272 (...snipped...) [ 71.304689][ T9694] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test1,task_memcg=/test1,task=a.out,pid=9692,uid=-2 [ 71.304703][ T9694] Memory cgroup out of memory: Kill process 9692 (a.out) score 904 or sacrifice child [ 71.309149][ T54] oom_reaper: reaped process 9750 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:185532kB [ 71.328523][ T9748] a.out invoked oom-killer: gfp_mask=0x7080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0 [ 71.328552][ T9748] CPU: 4 PID: 9748 Comm: a.out Kdump: loaded Not tainted 5.0.0-rc1-next-20190111 #272 (...snipped...) [ 71.328785][ T9748] Out of memory and no killable processes... [ 71.329194][ T9771] a.out invoked oom-killer: gfp_mask=0x7080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0 (...snipped...) [ 99.696592][ T9924] Out of memory and no killable processes... [ 99.699001][ T9838] a.out invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 (...snipped...) [ 99.833413][ T9838] Out of memory and no killable processes... ---------- $ grep -F 'Out of memory and no killable processes...' serial-20190111.txt | wc -l 213
On Fri 11-01-19 19:25:22, Tetsuo Handa wrote: > On 2019/01/11 8:59, Tetsuo Handa wrote: > > Michal Hocko wrote: > >> On Wed 09-01-19 20:34:46, Tetsuo Handa wrote: > >>> On 2019/01/09 20:03, Michal Hocko wrote: > >>>> Tetsuo, > >>>> can you confirm that these two patches are fixing the issue you have > >>>> reported please? > >>>> > >>> > >>> My patch fixes the issue better than your "[PATCH 2/2] memcg: do not > >>> report racy no-eligible OOM tasks" does. > >> > >> OK, so we are stuck again. Hooray! > > > > Andrew, will you pick up "[PATCH 3/2] memcg: Facilitate termination of memcg OOM victims." ? > > Since mm-oom-marks-all-killed-tasks-as-oom-victims.patch does not call mark_oom_victim() > > when task_will_free_mem() == true, memcg-do-not-report-racy-no-eligible-oom-tasks.patch > > does not close the race whereas my patch closes the race better. > > > > I confirmed that mm-oom-marks-all-killed-tasks-as-oom-victims.patch and > memcg-do-not-report-racy-no-eligible-oom-tasks.patch are completely failing > to fix the issue I am reporting. :-( OK, this is really interesting. This means that we are racing when marking all the tasks sharing the mm with the clone syscall. Does fatal_signal_pending handle this better?
On 2019/01/11 20:33, Michal Hocko wrote: > On Fri 11-01-19 19:25:22, Tetsuo Handa wrote: >> On 2019/01/11 8:59, Tetsuo Handa wrote: >>> Michal Hocko wrote: >>>> On Wed 09-01-19 20:34:46, Tetsuo Handa wrote: >>>>> On 2019/01/09 20:03, Michal Hocko wrote: >>>>>> Tetsuo, >>>>>> can you confirm that these two patches are fixing the issue you have >>>>>> reported please? >>>>>> >>>>> >>>>> My patch fixes the issue better than your "[PATCH 2/2] memcg: do not >>>>> report racy no-eligible OOM tasks" does. >>>> >>>> OK, so we are stuck again. Hooray! >>> >>> Andrew, will you pick up "[PATCH 3/2] memcg: Facilitate termination of memcg OOM victims." ? >>> Since mm-oom-marks-all-killed-tasks-as-oom-victims.patch does not call mark_oom_victim() >>> when task_will_free_mem() == true, memcg-do-not-report-racy-no-eligible-oom-tasks.patch >>> does not close the race whereas my patch closes the race better. >>> >> >> I confirmed that mm-oom-marks-all-killed-tasks-as-oom-victims.patch and >> memcg-do-not-report-racy-no-eligible-oom-tasks.patch are completely failing >> to fix the issue I am reporting. :-( > > OK, this is really interesting. This means that we are racing > when marking all the tasks sharing the mm with the clone syscall. Nothing interesting. This is NOT a race between clone() and the OOM killer. :-( By the moment the OOM killer is invoked, all clone() requests are already completed. Did you notice that there is no "Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n" line between [ 71.304703][ T9694] Memory cgroup out of memory: Kill process 9692 (a.out) score 904 or sacrifice child and [ 71.309149][ T54] oom_reaper: reaped process 9750 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:185532kB ? Then, you will find that [ T9694] failed to reach for_each_process(p) loop inside __oom_kill_process() in the first round of out_of_memory() call because find_lock_task_mm() == NULL at __oom_kill_process() because Ctrl-C made that victim complete exit_mm() before find_lock_task_mm() is called. Then, in the second round of out_of_memory() call, [ T9750] (which is fatal_signal_pending() == T && tsk_is_oom_victim() == F) hit task_will_free_mem(current) path and called mark_oom_victim() and woke up the OOM reaper. Then, before the third round of out_of_memory() call starts, the OOM reaper set MMF_OOM_SKIP. When the third round of out_of_memory() call started, [ T9748] could not hit task_will_free_mem(current) path because MMF_OOM_SKIP was already set, and oom_badness() ignored any mm which already has MMF_OOM_SKIP. As a result, [ T9748] failed to find a candidate. And this step repeats for up to number of threads (213 times for this run). > Does fatal_signal_pending handle this better? > Of course. My patch handles it perfectly. Even if we raced with clone() requests, why do we need to care about threads doing clone() requests? Such threads are not inside try_charge(), and therefore such threads can't contribute to this issue by calling out_of_memory() from try_charge().
On Fri 11-01-19 21:40:52, Tetsuo Handa wrote: [...] > Did you notice that there is no > > "Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n" > > line between > > [ 71.304703][ T9694] Memory cgroup out of memory: Kill process 9692 (a.out) score 904 or sacrifice child > > and > > [ 71.309149][ T54] oom_reaper: reaped process 9750 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:185532kB > > ? Then, you will find that [ T9694] failed to reach for_each_process(p) loop inside > __oom_kill_process() in the first round of out_of_memory() call because > find_lock_task_mm() == NULL at __oom_kill_process() because Ctrl-C made that victim > complete exit_mm() before find_lock_task_mm() is called. OK, so we haven't killed anything because the victim has exited by the time we wanted to do so. We still have other tasks sharing that mm pending and not killed because nothing has killed them yet, right? How come the oom reaper could act on this oom event at all then? What am I missing?
On 2019/01/11 22:34, Michal Hocko wrote: > On Fri 11-01-19 21:40:52, Tetsuo Handa wrote: > [...] >> Did you notice that there is no >> >> "Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n" >> >> line between >> >> [ 71.304703][ T9694] Memory cgroup out of memory: Kill process 9692 (a.out) score 904 or sacrifice child >> >> and >> >> [ 71.309149][ T54] oom_reaper: reaped process 9750 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:185532kB >> >> ? Then, you will find that [ T9694] failed to reach for_each_process(p) loop inside >> __oom_kill_process() in the first round of out_of_memory() call because >> find_lock_task_mm() == NULL at __oom_kill_process() because Ctrl-C made that victim >> complete exit_mm() before find_lock_task_mm() is called. > > OK, so we haven't killed anything because the victim has exited by the > time we wanted to do so. We still have other tasks sharing that mm > pending and not killed because nothing has killed them yet, right? The OOM killer invoked by [ T9694] called printk() but didn't kill anything. Instead, SIGINT from Ctrl-C killed all thread groups sharing current->mm. > > How come the oom reaper could act on this oom event at all then? > > What am I missing? > The OOM killer invoked by [ T9750] did not call printk() but hit task_will_free_mem(current) in out_of_memory() and invoked the OOM reaper, without calling mark_oom_victim() on all thread groups sharing current->mm. Did you notice that I wrote that Since mm-oom-marks-all-killed-tasks-as-oom-victims.patch does not call mark_oom_victim() when task_will_free_mem() == true, ? :-(
On Fri 11-01-19 23:31:18, Tetsuo Handa wrote: > On 2019/01/11 22:34, Michal Hocko wrote: > > On Fri 11-01-19 21:40:52, Tetsuo Handa wrote: > > [...] > >> Did you notice that there is no > >> > >> "Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n" > >> > >> line between > >> > >> [ 71.304703][ T9694] Memory cgroup out of memory: Kill process 9692 (a.out) score 904 or sacrifice child > >> > >> and > >> > >> [ 71.309149][ T54] oom_reaper: reaped process 9750 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:185532kB > >> > >> ? Then, you will find that [ T9694] failed to reach for_each_process(p) loop inside > >> __oom_kill_process() in the first round of out_of_memory() call because > >> find_lock_task_mm() == NULL at __oom_kill_process() because Ctrl-C made that victim > >> complete exit_mm() before find_lock_task_mm() is called. > > > > OK, so we haven't killed anything because the victim has exited by the > > time we wanted to do so. We still have other tasks sharing that mm > > pending and not killed because nothing has killed them yet, right? > > The OOM killer invoked by [ T9694] called printk() but didn't kill anything. > Instead, SIGINT from Ctrl-C killed all thread groups sharing current->mm. I still do not get it. Those other processes are not sharing signals. Or is it due to injecting the signal too all of them with the proper timing? > > How come the oom reaper could act on this oom event at all then? > > > > What am I missing? > > > > The OOM killer invoked by [ T9750] did not call printk() but hit > task_will_free_mem(current) in out_of_memory() and invoked the OOM reaper, > without calling mark_oom_victim() on all thread groups sharing current->mm. > Did you notice that I wrote that OK, now it starts making sense to me finally. I got hooked up in find_lock_task_mm failing in __oom_kill_process because we do see "Memory cgroup out of memory" and that happens _after_ task_will_free_mem. So the whole oom_reaper scenario didn't make much sense to me. > Since mm-oom-marks-all-killed-tasks-as-oom-victims.patch does not call mark_oom_victim() > when task_will_free_mem() == true, > > ? :-( No, I got lost in your writeup. While the task_will_free_mem is fixable but this would get us to even uglier code so I agree that the approach by my two patches is not feasible. I really wanted to have this heuristic based on the oom victim rather than signal pending because one lesson I've learned over time was that checks for fatal signals can lead to odd corner cases. Memcg is less prone to those issues because we can bypass the charge but still. Anyway, could you update your patch and abstract if (unlikely(tsk_is_oom_victim(current) || fatal_signal_pending(current) || current->flags & PF_EXITING)) in try_charge and reuse it in mem_cgroup_out_of_memory under the oom_lock with an explanation please? Andrew, please drop my 2 patches please.
On 2019/01/12 0:07, Michal Hocko wrote: > On Fri 11-01-19 23:31:18, Tetsuo Handa wrote: >> The OOM killer invoked by [ T9694] called printk() but didn't kill anything. >> Instead, SIGINT from Ctrl-C killed all thread groups sharing current->mm. > > I still do not get it. Those other processes are not sharing signals. > Or is it due to injecting the signal too all of them with the proper > timing? Pressing Ctrl-C between after task_will_free_mem(p) in oom_kill_process() and before __oom_kill_process() (e.g. dump_header()) made fatal_signal_pending() = T for all of them. > Anyway, could you update your patch and abstract > if (unlikely(tsk_is_oom_victim(current) || > fatal_signal_pending(current) || > current->flags & PF_EXITING)) > > in try_charge and reuse it in mem_cgroup_out_of_memory under the > oom_lock with an explanation please? I don't think doing so makes sense, for tsk_is_oom_victim(current) = T && fatal_signal_pending(current) == F can't happen for mem_cgroup_out_of_memory() under the oom_lock, and current->flags cannot get PF_EXITING when current is inside mem_cgroup_out_of_memory(). fatal_signal_pending(current) alone is appropriate for mem_cgroup_out_of_memory() under the oom_lock because tsk_is_oom_victim(current) = F && fatal_signal_pending(current) == T can happen there. Also, doing so might become wrong in future, for mem_cgroup_out_of_memory() is also called from memory_max_write() which does not bail out upon PF_EXITING. I don't think we can call memory_max_write() after current thread got PF_EXITING, but nobody knows what change will happen in future.
On Sat 12-01-19 00:37:05, Tetsuo Handa wrote: > On 2019/01/12 0:07, Michal Hocko wrote: > > On Fri 11-01-19 23:31:18, Tetsuo Handa wrote: > >> The OOM killer invoked by [ T9694] called printk() but didn't kill anything. > >> Instead, SIGINT from Ctrl-C killed all thread groups sharing current->mm. > > > > I still do not get it. Those other processes are not sharing signals. > > Or is it due to injecting the signal too all of them with the proper > > timing? > > Pressing Ctrl-C between after task_will_free_mem(p) in oom_kill_process() and > before __oom_kill_process() (e.g. dump_header()) made fatal_signal_pending() = T > for all of them. > > > Anyway, could you update your patch and abstract > > if (unlikely(tsk_is_oom_victim(current) || > > fatal_signal_pending(current) || > > current->flags & PF_EXITING)) > > > > in try_charge and reuse it in mem_cgroup_out_of_memory under the > > oom_lock with an explanation please? > > I don't think doing so makes sense, for > > tsk_is_oom_victim(current) = T && fatal_signal_pending(current) == F > > can't happen for mem_cgroup_out_of_memory() under the oom_lock, and > current->flags cannot get PF_EXITING when current is inside > mem_cgroup_out_of_memory(). fatal_signal_pending(current) alone is > appropriate for mem_cgroup_out_of_memory() under the oom_lock because > > tsk_is_oom_victim(current) = F && fatal_signal_pending(current) == T > > can happen there. I meant to use the same check consistently. If we can bypass the charge under a list of conditions in the charge path we should be surely be able to the the same for the oom path. I will not insist but unless there is a strong reason I would prefer that. > Also, doing so might become wrong in future, for mem_cgroup_out_of_memory() > is also called from memory_max_write() which does not bail out upon > PF_EXITING. I don't think we can call memory_max_write() after current > thread got PF_EXITING, but nobody knows what change will happen in future. No, this makes no sense what so ever.
On 2019/01/12 1:45, Michal Hocko wrote: >>> Anyway, could you update your patch and abstract >>> if (unlikely(tsk_is_oom_victim(current) || >>> fatal_signal_pending(current) || >>> current->flags & PF_EXITING)) >>> >>> in try_charge and reuse it in mem_cgroup_out_of_memory under the >>> oom_lock with an explanation please? >> >> I don't think doing so makes sense, for >> >> tsk_is_oom_victim(current) = T && fatal_signal_pending(current) == F >> >> can't happen for mem_cgroup_out_of_memory() under the oom_lock, and >> current->flags cannot get PF_EXITING when current is inside >> mem_cgroup_out_of_memory(). fatal_signal_pending(current) alone is >> appropriate for mem_cgroup_out_of_memory() under the oom_lock because >> >> tsk_is_oom_victim(current) = F && fatal_signal_pending(current) == T >> >> can happen there. > > I meant to use the same check consistently. If we can bypass the charge > under a list of conditions in the charge path we should be surely be > able to the the same for the oom path. I will not insist but unless > there is a strong reason I would prefer that. > You mean something like this? I'm not sure this change is safe. mm/memcontrol.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 17189da..1733d019 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -248,6 +248,12 @@ enum res_type { iter != NULL; \ iter = mem_cgroup_iter(NULL, iter, NULL)) +static inline bool can_ignore_limit(void) +{ + return tsk_is_oom_victim(current) || fatal_signal_pending(current) || + (current->flags & PF_EXITING); +} + /* Some nice accessors for the vmpressure. */ struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg) { @@ -1395,7 +1401,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, * A few threads which were not waiting at mutex_lock_killable() can * fail to bail out. Therefore, check again after holding oom_lock. */ - ret = fatal_signal_pending(current) || out_of_memory(&oc); + ret = can_ignore_limit() || out_of_memory(&oc); mutex_unlock(&oom_lock); return ret; } @@ -1724,6 +1730,10 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int mem_cgroup_unmark_under_oom(memcg); if (mem_cgroup_out_of_memory(memcg, mask, order)) + /* + * Returning OOM_SUCCESS upon can_ignore_limit() is OK, for + * the caller will check can_ignore_limit() again. + */ ret = OOM_SUCCESS; else ret = OOM_FAILED; @@ -1783,6 +1793,11 @@ bool mem_cgroup_oom_synchronize(bool handle) finish_wait(&memcg_oom_waitq, &owait.wait); mem_cgroup_out_of_memory(memcg, current->memcg_oom_gfp_mask, current->memcg_oom_order); + /* + * Returning upon can_ignore_limit() is OK, for the caller is + * already killed... CheckMe: Is this assumption correct? + * Page fault can't happen after getting PF_EXITING? + */ } else { schedule(); mem_cgroup_unmark_under_oom(memcg); @@ -2215,9 +2230,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * bypass the last charges so that they can exit quickly and * free their memory. */ - if (unlikely(tsk_is_oom_victim(current) || - fatal_signal_pending(current) || - current->flags & PF_EXITING)) + if (unlikely(can_ignore_limit())) goto force; /* @@ -5527,6 +5540,12 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, memcg_memory_event(memcg, MEMCG_OOM); if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0)) break; + /* + * There is no need to check can_ignore_limit() here, for + * signal_pending(current) above will break anyway. + */ + if (unlikely(can_ignore_limit())) + break; } memcg_wb_domain_size_changed(memcg);
On Sat 12-01-19 19:52:50, Tetsuo Handa wrote: > On 2019/01/12 1:45, Michal Hocko wrote: > >>> Anyway, could you update your patch and abstract > >>> if (unlikely(tsk_is_oom_victim(current) || > >>> fatal_signal_pending(current) || > >>> current->flags & PF_EXITING)) > >>> > >>> in try_charge and reuse it in mem_cgroup_out_of_memory under the > >>> oom_lock with an explanation please? > >> > >> I don't think doing so makes sense, for > >> > >> tsk_is_oom_victim(current) = T && fatal_signal_pending(current) == F > >> > >> can't happen for mem_cgroup_out_of_memory() under the oom_lock, and > >> current->flags cannot get PF_EXITING when current is inside > >> mem_cgroup_out_of_memory(). fatal_signal_pending(current) alone is > >> appropriate for mem_cgroup_out_of_memory() under the oom_lock because > >> > >> tsk_is_oom_victim(current) = F && fatal_signal_pending(current) == T > >> > >> can happen there. > > > > I meant to use the same check consistently. If we can bypass the charge > > under a list of conditions in the charge path we should be surely be > > able to the the same for the oom path. I will not insist but unless > > there is a strong reason I would prefer that. > > > > You mean something like this? I'm not sure this change is safe. > > mm/memcontrol.c | 27 +++++++++++++++++++++++---- > 1 file changed, 23 insertions(+), 4 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 17189da..1733d019 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -248,6 +248,12 @@ enum res_type { > iter != NULL; \ > iter = mem_cgroup_iter(NULL, iter, NULL)) > > +static inline bool can_ignore_limit(void) > +{ > + return tsk_is_oom_victim(current) || fatal_signal_pending(current) || > + (current->flags & PF_EXITING); > +} > + > /* Some nice accessors for the vmpressure. */ > struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg) > { > @@ -1395,7 +1401,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > * A few threads which were not waiting at mutex_lock_killable() can > * fail to bail out. Therefore, check again after holding oom_lock. > */ > - ret = fatal_signal_pending(current) || out_of_memory(&oc); > + ret = can_ignore_limit() || out_of_memory(&oc); > mutex_unlock(&oom_lock); > return ret; > } > @@ -2215,9 +2230,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > * bypass the last charges so that they can exit quickly and > * free their memory. > */ > - if (unlikely(tsk_is_oom_victim(current) || > - fatal_signal_pending(current) || > - current->flags & PF_EXITING)) > + if (unlikely(can_ignore_limit())) > goto force; > > /* I meant something as simple as this, indeed. I would just s@can_ignore_limit@should_force_charge@ but this is a minor thing.