diff mbox series

mm, memcg: show memcg min setting in oom messages

Message ID 1574239985-1916-1-git-send-email-laoar.shao@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm, memcg: show memcg min setting in oom messages | expand

Commit Message

Yafang Shao Nov. 20, 2019, 8:53 a.m. UTC
A task running in a memcg may OOM because of the memory.min settings of his
slibing and parent. If this happens, the current oom messages can't show
why file page cache can't be reclaimed. So it is better to show the memcg
min settings.
Let's take an example.
      bar    bar/memory.max = 1200M memory.min=800M
     /  \
   barA barB barA/memory.min = 800M memory.current=1G (file page cache)
             barB/memory.min = 0 (process in this memcg is allocating page)

The process will do memcg reclaim if the bar/memory.max is reached. Once
the barA/memory.min is reached it will stop reclaiming file page caches in
barA, and if there is no reclaimable pages in bar and bar/barB it will
enter memcg OOM then.
After this pacch, bellow messages will be show then (only includeing the
relevant messages here). The lines begin with '#' are newly added info (the
'#' symbol is not in the original messages).
	memory: usage 1228800kB, limit 1228800kB, failcnt 18337
	...
	# Memory cgroup min setting:
	# /bar: min 819200KB emin 0KB
	# /bar/barA: min 819200KB emin 819200KB
	# /bar/barB: min 0KB emin 0KB
	...
	Memory cgroup stats for /bar:
	anon 418328576
	file 835756032
	...
	unevictable 0
	...
	oom-kill:constraint=CONSTRAINT_MEMCG..oom_memcg=/bar,task_memcg=/bar/barB

With the new added information, we can find the memory.min in bar/barA is
reached and the processes in bar/barB can't reclaim file page cache from
bar/barA any more. While without this new added information we don't know
why the file page cache in bar can't be reclaimed.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 mm/memcontrol.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

Comments

Michal Hocko Nov. 20, 2019, 10:21 a.m. UTC | #1
On Wed 20-11-19 03:53:05, Yafang Shao wrote:
> A task running in a memcg may OOM because of the memory.min settings of his
> slibing and parent. If this happens, the current oom messages can't show
> why file page cache can't be reclaimed.

min limit is not the only way to protect memory from being reclaim. The
memory might be pinned or unreclaimable for other reasons (e.g. swap
quota exceeded for memcg). Besides that, there is the very same problem
with the global OOM killer, right? And I do not expect we want to print
all memcgs in the system (this might be hundreds).

> So it is better to show the memcg
> min settings.
> Let's take an example.
>       bar    bar/memory.max = 1200M memory.min=800M
>      /  \
>    barA barB barA/memory.min = 800M memory.current=1G (file page cache)
>              barB/memory.min = 0 (process in this memcg is allocating page)
> 
> The process will do memcg reclaim if the bar/memory.max is reached. Once
> the barA/memory.min is reached it will stop reclaiming file page caches in
> barA, and if there is no reclaimable pages in bar and bar/barB it will
> enter memcg OOM then.
> After this pacch, bellow messages will be show then (only includeing the
> relevant messages here). The lines begin with '#' are newly added info (the
> '#' symbol is not in the original messages).
> 	memory: usage 1228800kB, limit 1228800kB, failcnt 18337
> 	...
> 	# Memory cgroup min setting:
> 	# /bar: min 819200KB emin 0KB
> 	# /bar/barA: min 819200KB emin 819200KB
> 	# /bar/barB: min 0KB emin 0KB
> 	...
> 	Memory cgroup stats for /bar:
> 	anon 418328576
> 	file 835756032
> 	...
> 	unevictable 0
> 	...
> 	oom-kill:constraint=CONSTRAINT_MEMCG..oom_memcg=/bar,task_memcg=/bar/barB
> 
> With the new added information, we can find the memory.min in bar/barA is
> reached and the processes in bar/barB can't reclaim file page cache from
> bar/barA any more. While without this new added information we don't know
> why the file page cache in bar can't be reclaimed.

Well, I am not sure this is really usefull enough TBH. It doesn't give
you the whole picture and it potentially generates a lot of output in
the oom report. FYI we used to have a more precise break down of
counters in memcg hierarchy, see 58cf188ed649 ("memcg, oom: provide more
precise dump info while memcg oom happening") which later got rewritten
by c8713d0b2312 ("mm: memcontrol: dump memory.stat during cgroup OOM")

Could you be more specific why do you really need this piece of
information?

> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  mm/memcontrol.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
Yafang Shao Nov. 20, 2019, 10:53 a.m. UTC | #2
On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 20-11-19 03:53:05, Yafang Shao wrote:
> > A task running in a memcg may OOM because of the memory.min settings of his
> > slibing and parent. If this happens, the current oom messages can't show
> > why file page cache can't be reclaimed.
>
> min limit is not the only way to protect memory from being reclaim. The
> memory might be pinned or unreclaimable for other reasons (e.g. swap
> quota exceeded for memcg).

Both swap or unreclaimabed (unevicteable) is printed in OOM messages.
If something else can prevent the file cache being reclaimed, we'd
better show them as well.

> Besides that, there is the very same problem
> with the global OOM killer, right? And I do not expect we want to print
> all memcgs in the system (this might be hundreds).
>

I forgot the global oom...

Why not just print the memcgs which are under memory.min protection or
something like a total number of min protected memory ?

> > So it is better to show the memcg
> > min settings.
> > Let's take an example.
> >       bar    bar/memory.max = 1200M memory.min=800M
> >      /  \
> >    barA barB barA/memory.min = 800M memory.current=1G (file page cache)
> >              barB/memory.min = 0 (process in this memcg is allocating page)
> >
> > The process will do memcg reclaim if the bar/memory.max is reached. Once
> > the barA/memory.min is reached it will stop reclaiming file page caches in
> > barA, and if there is no reclaimable pages in bar and bar/barB it will
> > enter memcg OOM then.
> > After this pacch, bellow messages will be show then (only includeing the
> > relevant messages here). The lines begin with '#' are newly added info (the
> > '#' symbol is not in the original messages).
> >       memory: usage 1228800kB, limit 1228800kB, failcnt 18337
> >       ...
> >       # Memory cgroup min setting:
> >       # /bar: min 819200KB emin 0KB
> >       # /bar/barA: min 819200KB emin 819200KB
> >       # /bar/barB: min 0KB emin 0KB
> >       ...
> >       Memory cgroup stats for /bar:
> >       anon 418328576
> >       file 835756032
> >       ...
> >       unevictable 0
> >       ...
> >       oom-kill:constraint=CONSTRAINT_MEMCG..oom_memcg=/bar,task_memcg=/bar/barB
> >
> > With the new added information, we can find the memory.min in bar/barA is
> > reached and the processes in bar/barB can't reclaim file page cache from
> > bar/barA any more. While without this new added information we don't know
> > why the file page cache in bar can't be reclaimed.
>
> Well, I am not sure this is really usefull enough TBH. It doesn't give
> you the whole picture and it potentially generates a lot of output in
> the oom report. FYI we used to have a more precise break down of
> counters in memcg hierarchy, see 58cf188ed649 ("memcg, oom: provide more
> precise dump info while memcg oom happening") which later got rewritten
> by c8713d0b2312 ("mm: memcontrol: dump memory.stat during cgroup OOM")
>

At least we'd better print a total protected memory in the oom messages.

> Could you be more specific why do you really need this piece of
> information?

I have said in the commit log, that we don't know why the file cache
can't be reclaimed (when evictable is 0 and dirty is 0 as well.)

>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  mm/memcontrol.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> --
> Michal Hocko
> SUSE Labs
Michal Hocko Nov. 20, 2019, 11:40 a.m. UTC | #3
On Wed 20-11-19 18:53:44, Yafang Shao wrote:
> On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Wed 20-11-19 03:53:05, Yafang Shao wrote:
> > > A task running in a memcg may OOM because of the memory.min settings of his
> > > slibing and parent. If this happens, the current oom messages can't show
> > > why file page cache can't be reclaimed.
> >
> > min limit is not the only way to protect memory from being reclaim. The
> > memory might be pinned or unreclaimable for other reasons (e.g. swap
> > quota exceeded for memcg).
> 
> Both swap or unreclaimabed (unevicteable) is printed in OOM messages.

Not really. Consider a memcg which has reached it's swap limit. The
anonymous memory is not really reclaimable even when there is a lot of
swap space available.

> If something else can prevent the file cache being reclaimed, we'd
> better show them as well.

How are you going to do that? How do you track pins on pages?

> > Besides that, there is the very same problem
> > with the global OOM killer, right? And I do not expect we want to print
> > all memcgs in the system (this might be hundreds).
> >
> 
> I forgot the global oom...
> 
> Why not just print the memcgs which are under memory.min protection or
> something like a total number of min protected memory ?

Yes, this would likely help. But the main question really reamains, is
this really worth it?

> > > So it is better to show the memcg
> > > min settings.
> > > Let's take an example.
> > >       bar    bar/memory.max = 1200M memory.min=800M
> > >      /  \
> > >    barA barB barA/memory.min = 800M memory.current=1G (file page cache)
> > >              barB/memory.min = 0 (process in this memcg is allocating page)
> > >
> > > The process will do memcg reclaim if the bar/memory.max is reached. Once
> > > the barA/memory.min is reached it will stop reclaiming file page caches in
> > > barA, and if there is no reclaimable pages in bar and bar/barB it will
> > > enter memcg OOM then.
> > > After this pacch, bellow messages will be show then (only includeing the
> > > relevant messages here). The lines begin with '#' are newly added info (the
> > > '#' symbol is not in the original messages).
> > >       memory: usage 1228800kB, limit 1228800kB, failcnt 18337
> > >       ...
> > >       # Memory cgroup min setting:
> > >       # /bar: min 819200KB emin 0KB
> > >       # /bar/barA: min 819200KB emin 819200KB
> > >       # /bar/barB: min 0KB emin 0KB
> > >       ...
> > >       Memory cgroup stats for /bar:
> > >       anon 418328576
> > >       file 835756032
> > >       ...
> > >       unevictable 0
> > >       ...
> > >       oom-kill:constraint=CONSTRAINT_MEMCG..oom_memcg=/bar,task_memcg=/bar/barB
> > >
> > > With the new added information, we can find the memory.min in bar/barA is
> > > reached and the processes in bar/barB can't reclaim file page cache from
> > > bar/barA any more. While without this new added information we don't know
> > > why the file page cache in bar can't be reclaimed.
> >
> > Well, I am not sure this is really usefull enough TBH. It doesn't give
> > you the whole picture and it potentially generates a lot of output in
> > the oom report. FYI we used to have a more precise break down of
> > counters in memcg hierarchy, see 58cf188ed649 ("memcg, oom: provide more
> > precise dump info while memcg oom happening") which later got rewritten
> > by c8713d0b2312 ("mm: memcontrol: dump memory.stat during cgroup OOM")
> >
> 
> At least we'd better print a total protected memory in the oom messages.
> 
> > Could you be more specific why do you really need this piece of
> > information?
> 
> I have said in the commit log, that we don't know why the file cache
> can't be reclaimed (when evictable is 0 and dirty is 0 as well.)

And the counter argument is that this will not help you there much in
many large and much more common cases.

I argue, and I might be wrong here so feel free to correct me, that the
reclaim protection guarantee (min) is something to be under admins
control. It shouldn't really happen nilly-willy because it has really
large consequences, the OOM including. So if there is a suspicious
amount of memory that could be reclaimed normally then the reclaim
protection is really the first suspect to go after.
Yafang Shao Nov. 20, 2019, 12:23 p.m. UTC | #4
On Wed, Nov 20, 2019 at 7:40 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 20-11-19 18:53:44, Yafang Shao wrote:
> > On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Wed 20-11-19 03:53:05, Yafang Shao wrote:
> > > > A task running in a memcg may OOM because of the memory.min settings of his
> > > > slibing and parent. If this happens, the current oom messages can't show
> > > > why file page cache can't be reclaimed.
> > >
> > > min limit is not the only way to protect memory from being reclaim. The
> > > memory might be pinned or unreclaimable for other reasons (e.g. swap
> > > quota exceeded for memcg).
> >
> > Both swap or unreclaimabed (unevicteable) is printed in OOM messages.
>
> Not really. Consider a memcg which has reached it's swap limit. The
> anonymous memory is not really reclaimable even when there is a lot of
> swap space available.
>

The memcg swap limit is already printed in oom messages, see bellow,

[  141.721625] memory: usage 1228800kB, limit 1228800kB, failcnt 18337
[  141.721958] swap: usage 0kB, limit 9007199254740988kB, failcnt 0

> > If something else can prevent the file cache being reclaimed, we'd
> > better show them as well.
>
> How are you going to do that? How do you track pins on pages?
>

Actually I don't have a clear idea how to track pins on pages yet.


> > > Besides that, there is the very same problem
> > > with the global OOM killer, right? And I do not expect we want to print
> > > all memcgs in the system (this might be hundreds).
> > >
> >
> > I forgot the global oom...
> >
> > Why not just print the memcgs which are under memory.min protection or
> > something like a total number of min protected memory ?
>
> Yes, this would likely help. But the main question really reamains, is
> this really worth it?
>

If it doesn't cost too much, I think it is worth to do it.
As the oom path is not the critical path, so adding some print info
should not add much overhead.

> > > > So it is better to show the memcg
> > > > min settings.
> > > > Let's take an example.
> > > >       bar    bar/memory.max = 1200M memory.min=800M
> > > >      /  \
> > > >    barA barB barA/memory.min = 800M memory.current=1G (file page cache)
> > > >              barB/memory.min = 0 (process in this memcg is allocating page)
> > > >
> > > > The process will do memcg reclaim if the bar/memory.max is reached. Once
> > > > the barA/memory.min is reached it will stop reclaiming file page caches in
> > > > barA, and if there is no reclaimable pages in bar and bar/barB it will
> > > > enter memcg OOM then.
> > > > After this pacch, bellow messages will be show then (only includeing the
> > > > relevant messages here). The lines begin with '#' are newly added info (the
> > > > '#' symbol is not in the original messages).
> > > >       memory: usage 1228800kB, limit 1228800kB, failcnt 18337
> > > >       ...
> > > >       # Memory cgroup min setting:
> > > >       # /bar: min 819200KB emin 0KB
> > > >       # /bar/barA: min 819200KB emin 819200KB
> > > >       # /bar/barB: min 0KB emin 0KB
> > > >       ...
> > > >       Memory cgroup stats for /bar:
> > > >       anon 418328576
> > > >       file 835756032
> > > >       ...
> > > >       unevictable 0
> > > >       ...
> > > >       oom-kill:constraint=CONSTRAINT_MEMCG..oom_memcg=/bar,task_memcg=/bar/barB
> > > >
> > > > With the new added information, we can find the memory.min in bar/barA is
> > > > reached and the processes in bar/barB can't reclaim file page cache from
> > > > bar/barA any more. While without this new added information we don't know
> > > > why the file page cache in bar can't be reclaimed.
> > >
> > > Well, I am not sure this is really usefull enough TBH. It doesn't give
> > > you the whole picture and it potentially generates a lot of output in
> > > the oom report. FYI we used to have a more precise break down of
> > > counters in memcg hierarchy, see 58cf188ed649 ("memcg, oom: provide more
> > > precise dump info while memcg oom happening") which later got rewritten
> > > by c8713d0b2312 ("mm: memcontrol: dump memory.stat during cgroup OOM")
> > >
> >
> > At least we'd better print a total protected memory in the oom messages.
> >
> > > Could you be more specific why do you really need this piece of
> > > information?
> >
> > I have said in the commit log, that we don't know why the file cache
> > can't be reclaimed (when evictable is 0 and dirty is 0 as well.)
>
> And the counter argument is that this will not help you there much in
> many large and much more common cases.
>
> I argue, and I might be wrong here so feel free to correct me, that the
> reclaim protection guarantee (min) is something to be under admins
> control. It shouldn't really happen nilly-willy because it has really
> large consequences, the OOM including. So if there is a suspicious
> amount of memory that could be reclaimed normally then the reclaim
> protection is really the first suspect to go after.
> --

I don't know whether it happens nilly-willy or not.
But if we all know that it may cause OOMs and it don't take too much
effort to show it in the OOM messages,
we'd better show it. That can help the admins and the admins don't
need to suspect it any more.

Thanks
Yafang
Michal Hocko Nov. 22, 2019, 10:28 a.m. UTC | #5
On Wed 20-11-19 20:23:54, Yafang Shao wrote:
> On Wed, Nov 20, 2019 at 7:40 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Wed 20-11-19 18:53:44, Yafang Shao wrote:
> > > On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Wed 20-11-19 03:53:05, Yafang Shao wrote:
> > > > > A task running in a memcg may OOM because of the memory.min settings of his
> > > > > slibing and parent. If this happens, the current oom messages can't show
> > > > > why file page cache can't be reclaimed.
> > > >
> > > > min limit is not the only way to protect memory from being reclaim. The
> > > > memory might be pinned or unreclaimable for other reasons (e.g. swap
> > > > quota exceeded for memcg).
> > >
> > > Both swap or unreclaimabed (unevicteable) is printed in OOM messages.
> >
> > Not really. Consider a memcg which has reached it's swap limit. The
> > anonymous memory is not really reclaimable even when there is a lot of
> > swap space available.
> >
> 
> The memcg swap limit is already printed in oom messages, see bellow,
> 
> [  141.721625] memory: usage 1228800kB, limit 1228800kB, failcnt 18337
> [  141.721958] swap: usage 0kB, limit 9007199254740988kB, failcnt 0

But you do not have any insight on the swap limit down the oom
hierarchy, do you?

> > > Why not just print the memcgs which are under memory.min protection or
> > > something like a total number of min protected memory ?
> >
> > Yes, this would likely help. But the main question really reamains, is
> > this really worth it?
> >
> 
> If it doesn't cost too much, I think it is worth to do it.
> As the oom path is not the critical path, so adding some print info
> should not add much overhead.

Generating a lot of output for the oom reports has been a real problem
in many deployments.
[...]
> > > I have said in the commit log, that we don't know why the file cache
> > > can't be reclaimed (when evictable is 0 and dirty is 0 as well.)
> >
> > And the counter argument is that this will not help you there much in
> > many large and much more common cases.
> >
> > I argue, and I might be wrong here so feel free to correct me, that the
> > reclaim protection guarantee (min) is something to be under admins
> > control. It shouldn't really happen nilly-willy because it has really
> > large consequences, the OOM including. So if there is a suspicious
> > amount of memory that could be reclaimed normally then the reclaim
> > protection is really the first suspect to go after.
> > --
> 
> I don't know whether it happens nilly-willy or not.

It is a reclaim protection guarantee (so essentially an mlock like
thing) so it better have to be properly considered when used.

> But if we all know that it may cause OOMs and it don't take too much
> effort to show it in the OOM messages,

I do not think we are in agreement here. As mentioned above the oom
report is quite heavy already. So it should be other way around. There
should be a strong reason to add something more. A real use case where
not having that information is making debugging ooms considerably much
harder.
Yafang Shao Nov. 23, 2019, 5:52 a.m. UTC | #6
On Fri, Nov 22, 2019 at 6:28 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 20-11-19 20:23:54, Yafang Shao wrote:
> > On Wed, Nov 20, 2019 at 7:40 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Wed 20-11-19 18:53:44, Yafang Shao wrote:
> > > > On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > >
> > > > > On Wed 20-11-19 03:53:05, Yafang Shao wrote:
> > > > > > A task running in a memcg may OOM because of the memory.min settings of his
> > > > > > slibing and parent. If this happens, the current oom messages can't show
> > > > > > why file page cache can't be reclaimed.
> > > > >
> > > > > min limit is not the only way to protect memory from being reclaim. The
> > > > > memory might be pinned or unreclaimable for other reasons (e.g. swap
> > > > > quota exceeded for memcg).
> > > >
> > > > Both swap or unreclaimabed (unevicteable) is printed in OOM messages.
> > >
> > > Not really. Consider a memcg which has reached it's swap limit. The
> > > anonymous memory is not really reclaimable even when there is a lot of
> > > swap space available.
> > >
> >
> > The memcg swap limit is already printed in oom messages, see bellow,
> >
> > [  141.721625] memory: usage 1228800kB, limit 1228800kB, failcnt 18337
> > [  141.721958] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
>
> But you do not have any insight on the swap limit down the oom
> hierarchy, do you?
>
> > > > Why not just print the memcgs which are under memory.min protection or
> > > > something like a total number of min protected memory ?
> > >
> > > Yes, this would likely help. But the main question really reamains, is
> > > this really worth it?
> > >
> >
> > If it doesn't cost too much, I think it is worth to do it.
> > As the oom path is not the critical path, so adding some print info
> > should not add much overhead.
>
> Generating a lot of output for the oom reports has been a real problem
> in many deployments.

So why not only print non-zero counters ?
If some counters are 0, we don't print them, that can reduce the oom reports.

Something like "isolated_file:0 unevictable:0 dirty:0 writeback:0
unstable:0" can all be removed,
and we consider them as zero by default.
I mean we can optimze the OOM reports and only print the useful
information to make it not be a problem in many deployments.

> [...]
> > > > I have said in the commit log, that we don't know why the file cache
> > > > can't be reclaimed (when evictable is 0 and dirty is 0 as well.)
> > >
> > > And the counter argument is that this will not help you there much in
> > > many large and much more common cases.
> > >
> > > I argue, and I might be wrong here so feel free to correct me, that the
> > > reclaim protection guarantee (min) is something to be under admins
> > > control. It shouldn't really happen nilly-willy because it has really
> > > large consequences, the OOM including. So if there is a suspicious
> > > amount of memory that could be reclaimed normally then the reclaim
> > > protection is really the first suspect to go after.
> > > --
> >
> > I don't know whether it happens nilly-willy or not.
>
> It is a reclaim protection guarantee (so essentially an mlock like
> thing) so it better have to be properly considered when used.
>
> > But if we all know that it may cause OOMs and it don't take too much
> > effort to show it in the OOM messages,
>
> I do not think we are in agreement here. As mentioned above the oom
> report is quite heavy already. So it should be other way around. There
> should be a strong reason to add something more. A real use case where
> not having that information is making debugging ooms considerably much
> harder.
>
> --
> Michal Hocko
> SUSE Labs
Michal Hocko Nov. 25, 2019, 8:20 a.m. UTC | #7
On Sat 23-11-19 13:52:57, Yafang Shao wrote:
> On Fri, Nov 22, 2019 at 6:28 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Wed 20-11-19 20:23:54, Yafang Shao wrote:
> > > On Wed, Nov 20, 2019 at 7:40 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Wed 20-11-19 18:53:44, Yafang Shao wrote:
> > > > > On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > >
> > > > > > On Wed 20-11-19 03:53:05, Yafang Shao wrote:
> > > > > > > A task running in a memcg may OOM because of the memory.min settings of his
> > > > > > > slibing and parent. If this happens, the current oom messages can't show
> > > > > > > why file page cache can't be reclaimed.
> > > > > >
> > > > > > min limit is not the only way to protect memory from being reclaim. The
> > > > > > memory might be pinned or unreclaimable for other reasons (e.g. swap
> > > > > > quota exceeded for memcg).
> > > > >
> > > > > Both swap or unreclaimabed (unevicteable) is printed in OOM messages.
> > > >
> > > > Not really. Consider a memcg which has reached it's swap limit. The
> > > > anonymous memory is not really reclaimable even when there is a lot of
> > > > swap space available.
> > > >
> > >
> > > The memcg swap limit is already printed in oom messages, see bellow,
> > >
> > > [  141.721625] memory: usage 1228800kB, limit 1228800kB, failcnt 18337
> > > [  141.721958] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
> >
> > But you do not have any insight on the swap limit down the oom
> > hierarchy, do you?
> >
> > > > > Why not just print the memcgs which are under memory.min protection or
> > > > > something like a total number of min protected memory ?
> > > >
> > > > Yes, this would likely help. But the main question really reamains, is
> > > > this really worth it?
> > > >
> > >
> > > If it doesn't cost too much, I think it is worth to do it.
> > > As the oom path is not the critical path, so adding some print info
> > > should not add much overhead.
> >
> > Generating a lot of output for the oom reports has been a real problem
> > in many deployments.
> 
> So why not only print non-zero counters ?
> If some counters are 0, we don't print them, that can reduce the oom reports.
> 
> Something like "isolated_file:0 unevictable:0 dirty:0 writeback:0
> unstable:0" can all be removed,
> and we consider them as zero by default.

because that would make parsing more complex.

> I mean we can optimze the OOM reports and only print the useful
> information to make it not be a problem in many deployments.

We can, but it would be great to have it backed by som real usecase to
change the current behavior. I haven't heard anything so far. It is all
about "this would be nice" without a strong justification.
Yafang Shao Nov. 25, 2019, 9:12 a.m. UTC | #8
On Mon, Nov 25, 2019 at 4:20 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Sat 23-11-19 13:52:57, Yafang Shao wrote:
> > On Fri, Nov 22, 2019 at 6:28 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Wed 20-11-19 20:23:54, Yafang Shao wrote:
> > > > On Wed, Nov 20, 2019 at 7:40 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > >
> > > > > On Wed 20-11-19 18:53:44, Yafang Shao wrote:
> > > > > > On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > >
> > > > > > > On Wed 20-11-19 03:53:05, Yafang Shao wrote:
> > > > > > > > A task running in a memcg may OOM because of the memory.min settings of his
> > > > > > > > slibing and parent. If this happens, the current oom messages can't show
> > > > > > > > why file page cache can't be reclaimed.
> > > > > > >
> > > > > > > min limit is not the only way to protect memory from being reclaim. The
> > > > > > > memory might be pinned or unreclaimable for other reasons (e.g. swap
> > > > > > > quota exceeded for memcg).
> > > > > >
> > > > > > Both swap or unreclaimabed (unevicteable) is printed in OOM messages.
> > > > >
> > > > > Not really. Consider a memcg which has reached it's swap limit. The
> > > > > anonymous memory is not really reclaimable even when there is a lot of
> > > > > swap space available.
> > > > >
> > > >
> > > > The memcg swap limit is already printed in oom messages, see bellow,
> > > >
> > > > [  141.721625] memory: usage 1228800kB, limit 1228800kB, failcnt 18337
> > > > [  141.721958] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
> > >
> > > But you do not have any insight on the swap limit down the oom
> > > hierarchy, do you?
> > >
> > > > > > Why not just print the memcgs which are under memory.min protection or
> > > > > > something like a total number of min protected memory ?
> > > > >
> > > > > Yes, this would likely help. But the main question really reamains, is
> > > > > this really worth it?
> > > > >
> > > >
> > > > If it doesn't cost too much, I think it is worth to do it.
> > > > As the oom path is not the critical path, so adding some print info
> > > > should not add much overhead.
> > >
> > > Generating a lot of output for the oom reports has been a real problem
> > > in many deployments.
> >
> > So why not only print non-zero counters ?
> > If some counters are 0, we don't print them, that can reduce the oom reports.
> >
> > Something like "isolated_file:0 unevictable:0 dirty:0 writeback:0
> > unstable:0" can all be removed,
> > and we consider them as zero by default.
>
> because that would make parsing more complex.
>
> > I mean we can optimze the OOM reports and only print the useful
> > information to make it not be a problem in many deployments.
>
> We can, but it would be great to have it backed by som real usecase to
> change the current behavior. I haven't heard anything so far. It is all
> about "this would be nice" without a strong justification.

Because I was told by you that "Generating a lot of output for the oom
reports has been a real problem in many deployments.".
Maybe I misunderstood you : (

Thanks
Yafang
Michal Hocko Nov. 25, 2019, 9:27 a.m. UTC | #9
On Mon 25-11-19 17:12:54, Yafang Shao wrote:
> On Mon, Nov 25, 2019 at 4:20 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Sat 23-11-19 13:52:57, Yafang Shao wrote:
> > > On Fri, Nov 22, 2019 at 6:28 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Wed 20-11-19 20:23:54, Yafang Shao wrote:
> > > > > On Wed, Nov 20, 2019 at 7:40 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > >
> > > > > > On Wed 20-11-19 18:53:44, Yafang Shao wrote:
> > > > > > > On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Wed 20-11-19 03:53:05, Yafang Shao wrote:
> > > > > > > > > A task running in a memcg may OOM because of the memory.min settings of his
> > > > > > > > > slibing and parent. If this happens, the current oom messages can't show
> > > > > > > > > why file page cache can't be reclaimed.
> > > > > > > >
> > > > > > > > min limit is not the only way to protect memory from being reclaim. The
> > > > > > > > memory might be pinned or unreclaimable for other reasons (e.g. swap
> > > > > > > > quota exceeded for memcg).
> > > > > > >
> > > > > > > Both swap or unreclaimabed (unevicteable) is printed in OOM messages.
> > > > > >
> > > > > > Not really. Consider a memcg which has reached it's swap limit. The
> > > > > > anonymous memory is not really reclaimable even when there is a lot of
> > > > > > swap space available.
> > > > > >
> > > > >
> > > > > The memcg swap limit is already printed in oom messages, see bellow,
> > > > >
> > > > > [  141.721625] memory: usage 1228800kB, limit 1228800kB, failcnt 18337
> > > > > [  141.721958] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
> > > >
> > > > But you do not have any insight on the swap limit down the oom
> > > > hierarchy, do you?
> > > >
> > > > > > > Why not just print the memcgs which are under memory.min protection or
> > > > > > > something like a total number of min protected memory ?
> > > > > >
> > > > > > Yes, this would likely help. But the main question really reamains, is
> > > > > > this really worth it?
> > > > > >
> > > > >
> > > > > If it doesn't cost too much, I think it is worth to do it.
> > > > > As the oom path is not the critical path, so adding some print info
> > > > > should not add much overhead.
> > > >
> > > > Generating a lot of output for the oom reports has been a real problem
> > > > in many deployments.
> > >
> > > So why not only print non-zero counters ?
> > > If some counters are 0, we don't print them, that can reduce the oom reports.
> > >
> > > Something like "isolated_file:0 unevictable:0 dirty:0 writeback:0
> > > unstable:0" can all be removed,
> > > and we consider them as zero by default.
> >
> > because that would make parsing more complex.
> >
> > > I mean we can optimze the OOM reports and only print the useful
> > > information to make it not be a problem in many deployments.
> >
> > We can, but it would be great to have it backed by som real usecase to
> > change the current behavior. I haven't heard anything so far. It is all
> > about "this would be nice" without a strong justification.
> 
> Because I was told by you that "Generating a lot of output for the oom
> reports has been a real problem in many deployments.".
> Maybe I misunderstood you : (

I meant to say that we do not want to add more output unless there is a
real need. Changing the existing output is usually tricky because there
are existing parsers that might break on that change. Not that we do
guarantee the output format officially but we try hard to preserve as
much as we can if that is possible.

That being said maybe the lack of reclaim guarantee information is a
more wide spread problem and more people are suffering from it. If that
is the case then the justification would be much easier to formulate.
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1c4c08b..97fdd93 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1519,16 +1519,27 @@  void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *
  */
 void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
 {
+	struct mem_cgroup *iter;
 	char *buf;
 
 	pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n",
 		K((u64)page_counter_read(&memcg->memory)),
 		K((u64)memcg->memory.max), memcg->memory.failcnt);
-	if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
+	if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
 		pr_info("swap: usage %llukB, limit %llukB, failcnt %lu\n",
 			K((u64)page_counter_read(&memcg->swap)),
 			K((u64)memcg->swap.max), memcg->swap.failcnt);
-	else {
+
+		pr_info("Memory cgroup min setting:\n");
+		for_each_mem_cgroup_tree(iter, memcg) {
+			pr_cont_cgroup_path(iter->css.cgroup);
+			pr_cont(":");
+
+			pr_cont(" min %lluKB emin %lluKB\n",
+				K((u64)iter->memory.min),
+				K((u64)iter->memory.emin));
+		}
+	} else {
 		pr_info("memory+swap: usage %llukB, limit %llukB, failcnt %lu\n",
 			K((u64)page_counter_read(&memcg->memsw)),
 			K((u64)memcg->memsw.max), memcg->memsw.failcnt);