Message ID | 1526540428-12178-1-git-send-email-ufo19890607@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu 17-05-18 08:00:28, ufo19890607 wrote: > From: yuzhoujian <yuzhoujian@didichuxing.com> > > The dump_header does not print the memcg's name when the system > oom happened. Some users want to locate the certain container > which contains the task that has been killed by the oom killer. > So I add the mem_cgroup_print_oom_info when system oom events > happened. The oom report is quite heavy today. Do we really need the full memcg oom report here. Wouldn't it be sufficient to print the memcg the task belongs to? > Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com> > --- > mm/oom_kill.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 8ba6cb88cf58..244416c9834a 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -433,6 +433,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p) > if (is_memcg_oom(oc)) > mem_cgroup_print_oom_info(oc->memcg, p); > else { > + mem_cgroup_print_oom_info(mem_cgroup_from_task(p), p); > show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask); > if (is_dump_unreclaim_slabs()) > dump_unreclaimable_slab(); > -- > 2.14.1 >
Hi Michal I think the current OOM report is imcomplete. I can get the task which invoked the oom-killer and the task which has been killed by the oom-killer, and memory info when the oom happened. But I cannot infer the certain memcg to which the task killed by oom-killer belongs, because that task has been killed, and the dump_task will print all of the tasks in the system. mem_cgroup_print_oom_info will print five lines of content including memcg's name , usage, limit. I don't think five lines of content will cause a big problem. Or it at least prints the memcg's name. Thanks Wind Michal Hocko <mhocko@kernel.org> 于2018年5月17日周四 下午3:11写道: > On Thu 17-05-18 08:00:28, ufo19890607 wrote: > > From: yuzhoujian <yuzhoujian@didichuxing.com> > > > > The dump_header does not print the memcg's name when the system > > oom happened. Some users want to locate the certain container > > which contains the task that has been killed by the oom killer. > > So I add the mem_cgroup_print_oom_info when system oom events > > happened. > > The oom report is quite heavy today. Do we really need the full memcg > oom report here. Wouldn't it be sufficient to print the memcg the task > belongs to? > > > Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com> > > --- > > mm/oom_kill.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index 8ba6cb88cf58..244416c9834a 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -433,6 +433,7 @@ static void dump_header(struct oom_control *oc, > struct task_struct *p) > > if (is_memcg_oom(oc)) > > mem_cgroup_print_oom_info(oc->memcg, p); > > else { > > + mem_cgroup_print_oom_info(mem_cgroup_from_task(p), p); > > show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask); > > if (is_dump_unreclaim_slabs()) > > dump_unreclaimable_slab(); > > -- > > 2.14.1 > > > > -- > Michal Hocko > SUSE Labs >
On Thu 17-05-18 17:44:43, 禹舟键 wrote: > Hi Michal > I think the current OOM report is imcomplete. I can get the task which > invoked the oom-killer and the task which has been killed by the > oom-killer, and memory info when the oom happened. But I cannot infer the > certain memcg to which the task killed by oom-killer belongs, because that > task has been killed, and the dump_task will print all of the tasks in the > system. I can see how the origin memcg might be useful, but ... > > mem_cgroup_print_oom_info will print five lines of content including > memcg's name , usage, limit. I don't think five lines of content will cause > a big problem. Or it at least prints the memcg's name. this is not 5 lines at all. We dump memcg stats for the whole oom memcg subtree. For your patch it would be the whole subtree of the memcg of the oom victim. With cgroup v1 this can be quite deep as tasks can belong to inter-nodes as well. Would be pr_info("Task in "); pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id)); pr_cont(" killed as a result of limit of "); part of that output sufficient for your usecase? You will not get memory consumption of the group but is that really so relevant when we are killing individual tasks? Please note that there are proposals to make the global oom killer memcg aware and select by the memcg size rather than pick on random tasks (http://lkml.kernel.org/r/20171130152824.1591-1-guro@fb.com). Maybe that will be more interesting for your container usecase.
On Thu, May 17, 2018 at 12:23:30PM +0200, Michal Hocko wrote: > On Thu 17-05-18 17:44:43, 禹舟键 wrote: > > Hi Michal > > I think the current OOM report is imcomplete. I can get the task which > > invoked the oom-killer and the task which has been killed by the > > oom-killer, and memory info when the oom happened. But I cannot infer the > > certain memcg to which the task killed by oom-killer belongs, because that > > task has been killed, and the dump_task will print all of the tasks in the > > system. > > I can see how the origin memcg might be useful, but ... > > > > mem_cgroup_print_oom_info will print five lines of content including > > memcg's name , usage, limit. I don't think five lines of content will cause > > a big problem. Or it at least prints the memcg's name. I want only add here that if system-wide OOM is a rare event, you can look at per-cgroup oom counters to find the cgroup, which contained the killed task. Not super handy, but might work for debug purposes. > this is not 5 lines at all. We dump memcg stats for the whole oom memcg > subtree. For your patch it would be the whole subtree of the memcg of > the oom victim. With cgroup v1 this can be quite deep as tasks can > belong to inter-nodes as well. Would be > > pr_info("Task in "); > pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id)); > pr_cont(" killed as a result of limit of "); > > part of that output sufficient for your usecase? You will not get memory > consumption of the group but is that really so relevant when we are > killing individual tasks? Please note that there are proposals to make > the global oom killer memcg aware and select by the memcg size rather > than pick on random tasks > (http://lkml.kernel.org/r/20171130152824.1591-1-guro@fb.com). Maybe that > will be more interesting for your container usecase. Speaking about memcg OOM reports more broadly, IMO rather than spam with memcg-local OOM dumps to dmesg, it's better to add a new interface to read memcg-specific OOM reports. The current dmesg OOM report contains a lot of low-level stuff, which is handy for debugging system-wide OOM issues, and memcg-aware stuff too; that makes it bulky. Anyway, Michal's 1-line proposal looks quite acceptable to me. Thanks!
Hi yuzhoujian, Thank you for the patch! Yet something to improve: [auto build test ERROR on mmotm/master] [also build test ERROR on v4.17-rc5 next-20180517] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/ufo19890607/Add-the-memcg-print-oom-info-for-system-oom/20180520-041924 base: git://git.cmpxchg.org/linux-mmotm.git master config: x86_64-randconfig-x003-201820 (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All error/warnings (new ones prefixed by >>): mm/oom_kill.c: In function 'dump_header': >> mm/oom_kill.c:436:29: error: implicit declaration of function 'mem_cgroup_from_task'; did you mean 'mem_cgroup_from_id'? [-Werror=implicit-function-declaration] mem_cgroup_print_oom_info(mem_cgroup_from_task(p), p); ^~~~~~~~~~~~~~~~~~~~ mem_cgroup_from_id >> mm/oom_kill.c:436:29: warning: passing argument 1 of 'mem_cgroup_print_oom_info' makes pointer from integer without a cast [-Wint-conversion] In file included from include/linux/swap.h:9:0, from mm/oom_kill.c:28: include/linux/memcontrol.h:932:1: note: expected 'struct mem_cgroup *' but argument is of type 'int' mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p) ^~~~~~~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +436 mm/oom_kill.c 421 422 static void dump_header(struct oom_control *oc, struct task_struct *p) 423 { 424 pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), nodemask=%*pbl, order=%d, oom_score_adj=%hd\n", 425 current->comm, oc->gfp_mask, &oc->gfp_mask, 426 nodemask_pr_args(oc->nodemask), oc->order, 427 current->signal->oom_score_adj); 428 if (!IS_ENABLED(CONFIG_COMPACTION) && oc->order) 429 pr_warn("COMPACTION is disabled!!!\n"); 430 431 cpuset_print_current_mems_allowed(); 432 dump_stack(); 433 if (is_memcg_oom(oc)) 434 mem_cgroup_print_oom_info(oc->memcg, p); 435 else { > 436 mem_cgroup_print_oom_info(mem_cgroup_from_task(p), p); 437 show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask); 438 if (is_dump_unreclaim_slabs()) 439 dump_unreclaimable_slab(); 440 } 441 if (sysctl_oom_dump_tasks) 442 dump_tasks(oc->memcg, oc->nodemask); 443 } 444 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Thu, 17 May 2018, Michal Hocko wrote: > this is not 5 lines at all. We dump memcg stats for the whole oom memcg > subtree. For your patch it would be the whole subtree of the memcg of > the oom victim. With cgroup v1 this can be quite deep as tasks can > belong to inter-nodes as well. Would be > > pr_info("Task in "); > pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id)); > pr_cont(" killed as a result of limit of "); > > part of that output sufficient for your usecase? There's no memcg to print as the limit in the above, but it does seem like the single line output is all that is needed in this case. It might be useful to discuss a single line output that specifies relevant information about the context of the oom kill, the killed thread, and the memcg of that thread, in a way that will be backwards compatible. The messages in the oom killer have been restructured over time, I don't believe there is a backwards compatible way to search for an oom event in the kernel log. I've had success with defining a single line output the includes the CONSTRAINT_* of the oom kill, the origin and kill memcgs, the thread name, pid, and uid. On system oom kills, origin and kill memcgs are left empty. oom-kill constraint=CONSTRAINT_* origin_memcg=<memcg> kill_memcg=<memcg> task=<comm> pid=<pid> uid=<uid> Perhaps we should introduce a single line output that will be backwards compatible that includes this information?
On Mon 21-05-18 14:11:21, David Rientjes wrote: > On Thu, 17 May 2018, Michal Hocko wrote: > > > this is not 5 lines at all. We dump memcg stats for the whole oom memcg > > subtree. For your patch it would be the whole subtree of the memcg of > > the oom victim. With cgroup v1 this can be quite deep as tasks can > > belong to inter-nodes as well. Would be > > > > pr_info("Task in "); > > pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id)); > > pr_cont(" killed as a result of limit of "); > > > > part of that output sufficient for your usecase? > > There's no memcg to print as the limit in the above, but it does seem like > the single line output is all that is needed in this case. Yeah, that is exactly what I was proposing. I just copy&pasted the whole part to make it clear which part of mem_cgroup_print_oom_info I meant. Referring to "killed as a reslt of limit of" was misleading. Sorry about that. > It might be useful to discuss a single line output that specifies relevant > information about the context of the oom kill, the killed thread, and the > memcg of that thread, in a way that will be backwards compatible. The > messages in the oom killer have been restructured over time, I don't > believe there is a backwards compatible way to search for an oom event in > the kernel log. Agreed > I've had success with defining a single line output the includes the > CONSTRAINT_* of the oom kill, the origin and kill memcgs, the thread name, > pid, and uid. On system oom kills, origin and kill memcgs are left empty. > > oom-kill constraint=CONSTRAINT_* origin_memcg=<memcg> kill_memcg=<memcg> task=<comm> pid=<pid> uid=<uid> > > Perhaps we should introduce a single line output that will be backwards > compatible that includes this information? I do not have a strong preference here. We already print cpuset on its own line and we can do the same for the memcg.
On Tue, 22 May 2018, Michal Hocko wrote: > > I've had success with defining a single line output the includes the > > CONSTRAINT_* of the oom kill, the origin and kill memcgs, the thread name, > > pid, and uid. On system oom kills, origin and kill memcgs are left empty. > > > > oom-kill constraint=CONSTRAINT_* origin_memcg=<memcg> kill_memcg=<memcg> task=<comm> pid=<pid> uid=<uid> > > > > Perhaps we should introduce a single line output that will be backwards > > compatible that includes this information? > > I do not have a strong preference here. We already print cpuset on its > own line and we can do the same for the memcg. > Yes, for both the memcg that has reached its limit (origin_memcg) and the memcg the killed process is attached to (kill_memcg). It's beneficial to have a single-line output to avoid any printk interleaving or ratelimiting that includes the constraint, comm, and at least pid. (We include uid simply to find oom kills of root processes.) We already have all this information, including cpuset, cpuset nodemask, and allocation nodemask for mempolicy ooms. The only exception appears to be the kill_memcg for CONSTRAINT_NONE and for it to be emitted in a way that can't be interleaved or suppressed. Perhaps we can have this? oom-kill constraint=CONSTRAINT_* nodemask=<cpuset/mempolicy nodemask> origin_memcg=<memcg> kill_memcg=<memcg> task=<comm> pid=<pid> uid=<uid> For CONSTRAINT_NONE, nodemask and origin_memcg are empty. For CONSTRAINT_CPUSET and CONSTRAINT_MEMORY_POLICY, origin_memcg is empty.
diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 8ba6cb88cf58..244416c9834a 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -433,6 +433,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p) if (is_memcg_oom(oc)) mem_cgroup_print_oom_info(oc->memcg, p); else { + mem_cgroup_print_oom_info(mem_cgroup_from_task(p), p); show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask); if (is_dump_unreclaim_slabs()) dump_unreclaimable_slab();