Message ID | 1538226387-16600-1-git-send-email-ufo19890607@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v15,1/2] Reorganize the oom report in dump_header | expand |
On Sat 29-09-18 21:06:26, ufo19890607@gmail.com wrote: [...] > Changes since v14: > - add the dump_oom_summary for the single line output of oom context. > - fix the null pointer in the dump_header. I do not remember details about this null ptr but the fix you seemed to have done is [...] > +static void dump_oom_summary(struct oom_control *oc, struct task_struct *victim) > +{ > + /* one line summary of the oom killer context. */ > + pr_info("oom-kill:constraint=%s,nodemask=%*pbl", > + oom_constraint_text[oc->constraint], > + nodemask_pr_args(oc->nodemask)); > + cpuset_print_current_mems_allowed(); > + pr_cont(",task=%s,pid=%d,uid=%d\n", victim->comm, victim->pid, > + from_kuid(&init_user_ns, task_uid(victim))); > +} > + > /* > * Number of OOM victims in flight > */ > @@ -951,6 +960,8 @@ static void oom_kill_process(struct oom_control *oc, const char *message) > > if (__ratelimit(&oom_rs)) > dump_header(oc, p); > + if (oc) > + dump_oom_summary(oc, victim); > this? If yes then this is bogus because oc is never NULL. Besides that, you used to have this one line summary in dump_header which looks much better fit to me than oom_kill_process.
Hi Michal The null pointer is possible when calling the dump_header, this bug was detected by LKP. Below is the context 3 months ago. On Mon 30-07-18 19:05:50, David Rientjes wrote: > On Mon, 30 Jul 2018, Michal Hocko wrote: > > > On Mon 30-07-18 17:03:20, kernel test robot wrote: > > [...] > > > [ 9.034310] BUG: KASAN: null-ptr-deref in dump_header+0x10c/0x448 > > > > Could you faddr2line on the offset please? > > > > It's possible that p is NULL when calling dump_header(). In this case we > do not want to print any line concerning a victim because no oom kill has > occurred. > You are right. I have missed those. > This code shouldn't be part of dump_header(), which is called from > multiple contexts even when an oom kill has not occurred, and is > ratelimited. The single line output should be the canonical way that > userspace parses the log for oom victims, we can't ratelimit it. > > The following would be a fix patch, but it will be broken if the cgroup > aware oom killer is removed from -mm so that the oom_group stuff can be > merged. > cgroup aware oom killer is going to be replaced by a new implementation > IIUC so the fix should be based on the yuzhoujian patch. Ideally to be > resubmitted. > I would just suggest adding it into a function > dump_oom_summary(struct oom_control *oc, struct task_struct *p) > yuzhoujian could you take care of that please? I followed David's tip and call the new func dump_oom_summary in the oom_kill_process. > It's possible that p is NULL when calling dump_header(). In this case we > do not want to print any line concerning a victim because no oom kill has >occurred. > This code shouldn't be part of dump_header(), which is called from > multiple contexts even when an oom kill has not occurred, and is > ratelimited. The single line output should be the canonical way that > userspace parses the log for oom victims, we can't ratelimit it. > The following would be a fix patch, but it will be broken if the cgroup > aware oom killer is removed from -mm so that the oom_group stuff can be > merged. <div dir="ltr">Hi Michal<div>The null pointer is possible when calling the dump_header, this bug was detected by LKP. Below is the context 3 months ago.</div><div><br></div><div><br></div><div><span class="gmail-im" style="color:rgb(80,0,80)">On Mon 30-07-18 19:05:50, David Rientjes wrote:<br>> On Mon, 30 Jul 2018, Michal Hocko wrote:<br>> <br>> > On Mon 30-07-18 17:03:20, kernel test robot wrote:<br>> > [...]<br>> > > [ 9.034310] BUG: KASAN: null-ptr-deref in dump_header+0x10c/0x448<br>> > <br>> > Could you faddr2line on the offset please?<br>> > <br>> <br>> It's possible that p is NULL when calling dump_header(). In this case we <br>> do not want to print any line concerning a victim because no oom kill has <br>> occurred.<br><br></span>> You are right. I have missed those.<span class="gmail-im" style="color:rgb(80,0,80)"><br><br>> This code shouldn't be part of dump_header(), which is called from <br>> multiple contexts even when an oom kill has not occurred, and is <br>> ratelimited. The single line output should be the canonical way that <br>> userspace parses the log for oom victims, we can't ratelimit it.<br>> <br>> The following would be a fix patch, but it will be broken if the cgroup <br>> aware oom killer is removed from -mm so that the oom_group stuff can be <br>> merged.<br><br></span>> cgroup aware oom killer is going to be replaced by a new implementation<br>> IIUC so the fix should be based on the yuzhoujian patch. Ideally to be<br>> resubmitted.<br><br>> I would just suggest adding it into a function<br>> dump_oom_summary(struct oom_control *oc, struct task_struct *p)<br><br>> yuzhoujian could you take care of that please? <br></div><div><br></div><div>I followed David's tip and call the new func dump_oom_summary in the oom_kill_process.</div><div><br></div><div>> It's possible that p is NULL when calling dump_header(). In this case we <br>> do not want to print any line concerning a victim because no oom kill has <br>>occurred.<br><br>> This code shouldn't be part of dump_header(), which is called from <br>> multiple contexts even when an oom kill has not occurred, and is <br>> ratelimited. The single line output should be the canonical way that <br>> userspace parses the log for oom victims, we can't ratelimit it.<br><br>> The following would be a fix patch, but it will be broken if the cgroup <br>> aware oom killer is removed from -mm so that the oom_group stuff can be <br>> merged. </div><div> <br></div></div>
On Thu 01-11-18 18:09:39, 禹舟键 wrote: > Hi Michal > The null pointer is possible when calling the dump_header, this bug was > detected by LKP. Below is the context 3 months ago. Yeah I remember it was 0day report but I coundn't find it in my email archive. Do you happen to have a message-id? Anyway if (__ratelimit(&oom_rs)) dump_header(oc, p); + if (oc) + dump_oom_summary(oc, victim); Clearly cannot solve any NULL ptr because oc is never NULL unless I am missing something terribly.
Hi Michal The message-id is as below https://lkml.org/lkml/2018/7/31/148 Thanks <div dir="ltr"><div dir="ltr">Hi Michal<div>The message-id is as below</div><div><a href="https://lkml.org/lkml/2018/7/31/148">https://lkml.org/lkml/2018/7/31/148</a><br></div><div><br></div><div>Thanks</div></div></div>
On Fri 02-11-18 14:18:59, 禹舟键 wrote: > Hi Michal > The message-id is as below > https://lkml.org/lkml/2018/7/31/148 David said : It's possible that p is NULL when calling dump_header(). In this case we : do not want to print any line concerning a victim because no oom kill has : occurred. This means that we should check for p rather than oc.
diff --git a/include/linux/oom.h b/include/linux/oom.h index 69864a5..d079920 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -15,6 +15,13 @@ struct mem_cgroup; struct task_struct; +enum oom_constraint { + CONSTRAINT_NONE, + CONSTRAINT_CPUSET, + CONSTRAINT_MEMORY_POLICY, + CONSTRAINT_MEMCG, +}; + /* * Details of the page allocation that triggered the oom killer that are used to * determine what should be killed. @@ -42,6 +49,9 @@ struct oom_control { unsigned long totalpages; struct task_struct *chosen; unsigned long chosen_points; + + /* Used to print the constraint info. */ + enum oom_constraint constraint; }; extern struct mutex oom_lock; diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 266f10c..9510a5b 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2666,9 +2666,9 @@ void cpuset_print_current_mems_allowed(void) rcu_read_lock(); cgrp = task_cs(current)->css.cgroup; - pr_info("%s cpuset=", current->comm); + pr_cont(",cpuset="); pr_cont_cgroup_name(cgrp); - pr_cont(" mems_allowed=%*pbl\n", + pr_cont(",mems_allowed=%*pbl", nodemask_pr_args(¤t->mems_allowed)); rcu_read_unlock(); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index f10aa53..0935fca 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -245,11 +245,11 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, return points > 0 ? points : 1; } -enum oom_constraint { - CONSTRAINT_NONE, - CONSTRAINT_CPUSET, - CONSTRAINT_MEMORY_POLICY, - CONSTRAINT_MEMCG, +static const char * const oom_constraint_text[] = { + [CONSTRAINT_NONE] = "CONSTRAINT_NONE", + [CONSTRAINT_CPUSET] = "CONSTRAINT_CPUSET", + [CONSTRAINT_MEMORY_POLICY] = "CONSTRAINT_MEMORY_POLICY", + [CONSTRAINT_MEMCG] = "CONSTRAINT_MEMCG", }; /* @@ -430,14 +430,12 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask) static void dump_header(struct oom_control *oc, struct task_struct *p) { - pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), nodemask=%*pbl, order=%d, oom_score_adj=%hd\n", - current->comm, oc->gfp_mask, &oc->gfp_mask, - nodemask_pr_args(oc->nodemask), oc->order, + pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), order=%d, oom_score_adj=%hd\n", + current->comm, oc->gfp_mask, &oc->gfp_mask, oc->order, current->signal->oom_score_adj); if (!IS_ENABLED(CONFIG_COMPACTION) && oc->order) pr_warn("COMPACTION is disabled!!!\n"); - cpuset_print_current_mems_allowed(); dump_stack(); if (is_memcg_oom(oc)) mem_cgroup_print_oom_info(oc->memcg, p); @@ -450,6 +448,17 @@ static void dump_header(struct oom_control *oc, struct task_struct *p) dump_tasks(oc->memcg, oc->nodemask); } +static void dump_oom_summary(struct oom_control *oc, struct task_struct *victim) +{ + /* one line summary of the oom killer context. */ + pr_info("oom-kill:constraint=%s,nodemask=%*pbl", + oom_constraint_text[oc->constraint], + nodemask_pr_args(oc->nodemask)); + cpuset_print_current_mems_allowed(); + pr_cont(",task=%s,pid=%d,uid=%d\n", victim->comm, victim->pid, + from_kuid(&init_user_ns, task_uid(victim))); +} + /* * Number of OOM victims in flight */ @@ -951,6 +960,8 @@ static void oom_kill_process(struct oom_control *oc, const char *message) if (__ratelimit(&oom_rs)) dump_header(oc, p); + if (oc) + dump_oom_summary(oc, victim); pr_err("%s: Kill process %d (%s) score %u or sacrifice child\n", message, task_pid_nr(p), p->comm, points); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 89d2a2a..ff18663 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3416,13 +3416,13 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; - pr_warn("%s: %pV, mode:%#x(%pGg), nodemask=%*pbl\n", + pr_warn("%s: %pV, mode:%#x(%pGg), nodemask=%*pbl", current->comm, &vaf, gfp_mask, &gfp_mask, nodemask_pr_args(nodemask)); va_end(args); cpuset_print_current_mems_allowed(); - + pr_cont("\n"); dump_stack(); warn_alloc_show_mem(gfp_mask, nodemask); }