Message ID | 20180529081639.GM27180@dhcp22.suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2018/05/29 17:16, Michal Hocko wrote: > With the full changelog. This can be either folded into the respective > patch or applied on top. > >>From 0bd619e7a68337c97bdaed288e813e96a14ba339 Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Tue, 29 May 2018 10:09:33 +0200 > Subject: [PATCH] mm, memcg, oom: fix pre-mature allocation failures > > Tetsuo has noticed that "mm, oom: cgroup-aware OOM killer" can lead to a > pre-mature allocation failure if the cgroup aware oom killer is enabled > and select_victim_memcg doesn't pick up any memcg to kill because there > is a memcg already being killed. oc->chosen_memcg will become INFLIGHT_VICTIM > and oom_kill_memcg_victim will bail out early. oc->chosen_task will > stay NULL, however, and out_of_memory will therefore return false which > forces __alloc_pages_may_oom to not set did_some_progress and the page > allocator backs out and fails the allocation. > U > Fix this by checking both chosen_task and chosen_memcg in out_of_memory > and return false only when _both_ are NULL. I don't like this patch. It is not easy to understand and is fragile to future changes. Currently the only case !!oc->chosen can become false is that there was no eligible tasks when SysRq-f was requested or memcg OOM occurred. /* Found nothing?!?! Either we hang forever, or we panic. */ if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { With this patch applied, what happens if mem_cgroup_select_oom_victim(oc) && oom_kill_memcg_victim(oc) forgot to set oc->chosen_memcg to NULL and called select_bad_process(oc) and reached /* Found nothing?!?! Either we hang forever, or we panic. */ if (!oc->chosen_task && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { but did not trigger panic() because of is_sysrq_oom(oc) || is_memcg_oom(oc) and reached the last "!!(oc->chosen_task | oc->chosen_memcg)" line? It will by error return "true" when no eligible tasks found... Don't make return conditions complicated. The appropriate fix is to kill "delay" and "goto out;" now! My patch does it!!
On Tue 29-05-18 23:33:13, Tetsuo Handa wrote: > On 2018/05/29 17:16, Michal Hocko wrote: > > With the full changelog. This can be either folded into the respective > > patch or applied on top. > > > >>From 0bd619e7a68337c97bdaed288e813e96a14ba339 Mon Sep 17 00:00:00 2001 > > From: Michal Hocko <mhocko@suse.com> > > Date: Tue, 29 May 2018 10:09:33 +0200 > > Subject: [PATCH] mm, memcg, oom: fix pre-mature allocation failures > > > > Tetsuo has noticed that "mm, oom: cgroup-aware OOM killer" can lead to a > > pre-mature allocation failure if the cgroup aware oom killer is enabled > > and select_victim_memcg doesn't pick up any memcg to kill because there > > is a memcg already being killed. oc->chosen_memcg will become INFLIGHT_VICTIM > > and oom_kill_memcg_victim will bail out early. oc->chosen_task will > > stay NULL, however, and out_of_memory will therefore return false which > > forces __alloc_pages_may_oom to not set did_some_progress and the page > > allocator backs out and fails the allocation. > > U > > Fix this by checking both chosen_task and chosen_memcg in out_of_memory > > and return false only when _both_ are NULL. > > I don't like this patch. It is not easy to understand and is fragile to > future changes. Currently the only case !!oc->chosen can become false is that > there was no eligible tasks when SysRq-f was requested or memcg OOM occurred. Well, the current contract is not easy unfortunatelly. We have two different modes of operation. We are either killing whole cgroups or a task from a cgroup. In any case, the contract says that if we have any killable entity then at least one of chosen* is set to INFLIGHT_VICTIM. Other than that one of them has to be !NULL or we have no eligible killable entity. The return value reflects all these cases. > /* Found nothing?!?! Either we hang forever, or we panic. */ > if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { > > With this patch applied, what happens if > mem_cgroup_select_oom_victim(oc) && oom_kill_memcg_victim(oc) forgot to set > oc->chosen_memcg to NULL and called select_bad_process(oc) and reached Well, this would be a clear bug and breaking the expected contract. I do not see such a bug. Do you? > /* Found nothing?!?! Either we hang forever, or we panic. */ > if (!oc->chosen_task && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { > > but did not trigger panic() because of is_sysrq_oom(oc) || is_memcg_oom(oc) > and reached the last "!!(oc->chosen_task | oc->chosen_memcg)" line? > It will by error return "true" when no eligible tasks found... What are you talking about? > Don't make return conditions complicated. > The appropriate fix is to kill "delay" and "goto out;" now! My patch does it!! I will leave the decision about which fix to take to Roman.
On Tue 29-05-18 19:18:33, Michal Hocko wrote: > On Tue 29-05-18 23:33:13, Tetsuo Handa wrote: > > On 2018/05/29 17:16, Michal Hocko wrote: > > > With the full changelog. This can be either folded into the respective > > > patch or applied on top. > > > > > >>From 0bd619e7a68337c97bdaed288e813e96a14ba339 Mon Sep 17 00:00:00 2001 > > > From: Michal Hocko <mhocko@suse.com> > > > Date: Tue, 29 May 2018 10:09:33 +0200 > > > Subject: [PATCH] mm, memcg, oom: fix pre-mature allocation failures > > > > > > Tetsuo has noticed that "mm, oom: cgroup-aware OOM killer" can lead to a > > > pre-mature allocation failure if the cgroup aware oom killer is enabled > > > and select_victim_memcg doesn't pick up any memcg to kill because there > > > is a memcg already being killed. oc->chosen_memcg will become INFLIGHT_VICTIM > > > and oom_kill_memcg_victim will bail out early. oc->chosen_task will > > > stay NULL, however, and out_of_memory will therefore return false which > > > forces __alloc_pages_may_oom to not set did_some_progress and the page > > > allocator backs out and fails the allocation. > > > U > > > Fix this by checking both chosen_task and chosen_memcg in out_of_memory > > > and return false only when _both_ are NULL. > > > > I don't like this patch. It is not easy to understand and is fragile to > > future changes. Currently the only case !!oc->chosen can become false is that > > there was no eligible tasks when SysRq-f was requested or memcg OOM occurred. > > Well, the current contract is not easy unfortunatelly. We have two > different modes of operation. We are either killing whole cgroups or a > task from a cgroup. In any case, the contract says that if we have any > killable entity then at least one of chosen* is set to INFLIGHT_VICTIM. > Other than that one of them has to be !NULL or we have no eligible > killable entity. The return value reflects all these cases. Btw. if your concern is the readability then we can add a helper and decsribe all the above in the comment.
Hi Michal, I love your patch! Yet something to improve: [auto build test ERROR on mmotm/master] [also build test ERROR on next-20180531] [cannot apply to v4.17-rc7] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Michal-Hocko/mm-memcg-oom-fix-pre-mature-allocation-failures/20180531-220120 base: git://git.cmpxchg.org/linux-mmotm.git master config: i386-randconfig-x078-201821 (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by >>): mm/oom_kill.c: In function 'out_of_memory': >> mm/oom_kill.c:1157:28: error: invalid operands to binary | (have 'struct task_struct *' and 'struct mem_cgroup *') return !!(oc->chosen_task | oc->chosen_memcg); ~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~ >> mm/oom_kill.c:1158:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ vim +1157 mm/oom_kill.c 1068 1069 /** 1070 * out_of_memory - kill the "best" process when we run out of memory 1071 * @oc: pointer to struct oom_control 1072 * 1073 * If we run out of memory, we have the choice between either 1074 * killing a random task (bad), letting the system crash (worse) 1075 * OR try to be smart about which process to kill. Note that we 1076 * don't have to be perfect here, we just have to be good. 1077 */ 1078 bool out_of_memory(struct oom_control *oc) 1079 { 1080 unsigned long freed = 0; 1081 enum oom_constraint constraint = CONSTRAINT_NONE; 1082 bool delay = false; /* if set, delay next allocation attempt */ 1083 1084 if (oom_killer_disabled) 1085 return false; 1086 1087 if (!is_memcg_oom(oc)) { 1088 blocking_notifier_call_chain(&oom_notify_list, 0, &freed); 1089 if (freed > 0) 1090 /* Got some memory back in the last second. */ 1091 return true; 1092 } 1093 1094 /* 1095 * If current has a pending SIGKILL or is exiting, then automatically 1096 * select it. The goal is to allow it to allocate so that it may 1097 * quickly exit and free its memory. 1098 */ 1099 if (task_will_free_mem(current)) { 1100 mark_oom_victim(current); 1101 wake_oom_reaper(current); 1102 return true; 1103 } 1104 1105 /* 1106 * The OOM killer does not compensate for IO-less reclaim. 1107 * pagefault_out_of_memory lost its gfp context so we have to 1108 * make sure exclude 0 mask - all other users should have at least 1109 * ___GFP_DIRECT_RECLAIM to get here. 1110 */ 1111 if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS)) 1112 return true; 1113 1114 /* 1115 * Check if there were limitations on the allocation (only relevant for 1116 * NUMA and memcg) that may require different handling. 1117 */ 1118 constraint = constrained_alloc(oc); 1119 if (constraint != CONSTRAINT_MEMORY_POLICY) 1120 oc->nodemask = NULL; 1121 check_panic_on_oom(oc, constraint); 1122 1123 if (!is_memcg_oom(oc) && sysctl_oom_kill_allocating_task && 1124 current->mm && !oom_unkillable_task(current, NULL, oc->nodemask) && 1125 current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) { 1126 get_task_struct(current); 1127 oc->chosen_task = current; 1128 oom_kill_process(oc, "Out of memory (oom_kill_allocating_task)"); 1129 return true; 1130 } 1131 1132 if (mem_cgroup_select_oom_victim(oc) && oom_kill_memcg_victim(oc)) { 1133 delay = true; 1134 goto out; 1135 } 1136 1137 select_bad_process(oc); 1138 /* Found nothing?!?! Either we hang forever, or we panic. */ 1139 if (!oc->chosen_task && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { 1140 dump_header(oc, NULL); 1141 panic("Out of memory and no killable processes...\n"); 1142 } 1143 if (oc->chosen_task && oc->chosen_task != INFLIGHT_VICTIM) { 1144 oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : 1145 "Memory cgroup out of memory"); 1146 delay = true; 1147 } 1148 1149 out: 1150 /* 1151 * Give the killed process a good chance to exit before trying 1152 * to allocate memory again. 1153 */ 1154 if (delay) 1155 schedule_timeout_killable(1); 1156 > 1157 return !!(oc->chosen_task | oc->chosen_memcg); > 1158 } 1159 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 565e7da55318..fc06af041447 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1174,7 +1174,7 @@ bool out_of_memory(struct oom_control *oc) if (delay) schedule_timeout_killable(1); - return !!oc->chosen_task; + return !!(oc->chosen_task | oc->chosen_memcg); } /*