Message ID | 1574239985-1916-1-git-send-email-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm, memcg: show memcg min setting in oom messages | expand |
On Wed 20-11-19 03:53:05, Yafang Shao wrote: > A task running in a memcg may OOM because of the memory.min settings of his > slibing and parent. If this happens, the current oom messages can't show > why file page cache can't be reclaimed. min limit is not the only way to protect memory from being reclaim. The memory might be pinned or unreclaimable for other reasons (e.g. swap quota exceeded for memcg). Besides that, there is the very same problem with the global OOM killer, right? And I do not expect we want to print all memcgs in the system (this might be hundreds). > So it is better to show the memcg > min settings. > Let's take an example. > bar bar/memory.max = 1200M memory.min=800M > / \ > barA barB barA/memory.min = 800M memory.current=1G (file page cache) > barB/memory.min = 0 (process in this memcg is allocating page) > > The process will do memcg reclaim if the bar/memory.max is reached. Once > the barA/memory.min is reached it will stop reclaiming file page caches in > barA, and if there is no reclaimable pages in bar and bar/barB it will > enter memcg OOM then. > After this pacch, bellow messages will be show then (only includeing the > relevant messages here). The lines begin with '#' are newly added info (the > '#' symbol is not in the original messages). > memory: usage 1228800kB, limit 1228800kB, failcnt 18337 > ... > # Memory cgroup min setting: > # /bar: min 819200KB emin 0KB > # /bar/barA: min 819200KB emin 819200KB > # /bar/barB: min 0KB emin 0KB > ... > Memory cgroup stats for /bar: > anon 418328576 > file 835756032 > ... > unevictable 0 > ... > oom-kill:constraint=CONSTRAINT_MEMCG..oom_memcg=/bar,task_memcg=/bar/barB > > With the new added information, we can find the memory.min in bar/barA is > reached and the processes in bar/barB can't reclaim file page cache from > bar/barA any more. While without this new added information we don't know > why the file page cache in bar can't be reclaimed. Well, I am not sure this is really usefull enough TBH. It doesn't give you the whole picture and it potentially generates a lot of output in the oom report. FYI we used to have a more precise break down of counters in memcg hierarchy, see 58cf188ed649 ("memcg, oom: provide more precise dump info while memcg oom happening") which later got rewritten by c8713d0b2312 ("mm: memcontrol: dump memory.stat during cgroup OOM") Could you be more specific why do you really need this piece of information? > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > --- > mm/memcontrol.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-)
On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Wed 20-11-19 03:53:05, Yafang Shao wrote: > > A task running in a memcg may OOM because of the memory.min settings of his > > slibing and parent. If this happens, the current oom messages can't show > > why file page cache can't be reclaimed. > > min limit is not the only way to protect memory from being reclaim. The > memory might be pinned or unreclaimable for other reasons (e.g. swap > quota exceeded for memcg). Both swap or unreclaimabed (unevicteable) is printed in OOM messages. If something else can prevent the file cache being reclaimed, we'd better show them as well. > Besides that, there is the very same problem > with the global OOM killer, right? And I do not expect we want to print > all memcgs in the system (this might be hundreds). > I forgot the global oom... Why not just print the memcgs which are under memory.min protection or something like a total number of min protected memory ? > > So it is better to show the memcg > > min settings. > > Let's take an example. > > bar bar/memory.max = 1200M memory.min=800M > > / \ > > barA barB barA/memory.min = 800M memory.current=1G (file page cache) > > barB/memory.min = 0 (process in this memcg is allocating page) > > > > The process will do memcg reclaim if the bar/memory.max is reached. Once > > the barA/memory.min is reached it will stop reclaiming file page caches in > > barA, and if there is no reclaimable pages in bar and bar/barB it will > > enter memcg OOM then. > > After this pacch, bellow messages will be show then (only includeing the > > relevant messages here). The lines begin with '#' are newly added info (the > > '#' symbol is not in the original messages). > > memory: usage 1228800kB, limit 1228800kB, failcnt 18337 > > ... > > # Memory cgroup min setting: > > # /bar: min 819200KB emin 0KB > > # /bar/barA: min 819200KB emin 819200KB > > # /bar/barB: min 0KB emin 0KB > > ... > > Memory cgroup stats for /bar: > > anon 418328576 > > file 835756032 > > ... > > unevictable 0 > > ... > > oom-kill:constraint=CONSTRAINT_MEMCG..oom_memcg=/bar,task_memcg=/bar/barB > > > > With the new added information, we can find the memory.min in bar/barA is > > reached and the processes in bar/barB can't reclaim file page cache from > > bar/barA any more. While without this new added information we don't know > > why the file page cache in bar can't be reclaimed. > > Well, I am not sure this is really usefull enough TBH. It doesn't give > you the whole picture and it potentially generates a lot of output in > the oom report. FYI we used to have a more precise break down of > counters in memcg hierarchy, see 58cf188ed649 ("memcg, oom: provide more > precise dump info while memcg oom happening") which later got rewritten > by c8713d0b2312 ("mm: memcontrol: dump memory.stat during cgroup OOM") > At least we'd better print a total protected memory in the oom messages. > Could you be more specific why do you really need this piece of > information? I have said in the commit log, that we don't know why the file cache can't be reclaimed (when evictable is 0 and dirty is 0 as well.) > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > --- > > mm/memcontrol.c | 15 +++++++++++++-- > > 1 file changed, 13 insertions(+), 2 deletions(-) > -- > Michal Hocko > SUSE Labs
On Wed 20-11-19 18:53:44, Yafang Shao wrote: > On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Wed 20-11-19 03:53:05, Yafang Shao wrote: > > > A task running in a memcg may OOM because of the memory.min settings of his > > > slibing and parent. If this happens, the current oom messages can't show > > > why file page cache can't be reclaimed. > > > > min limit is not the only way to protect memory from being reclaim. The > > memory might be pinned or unreclaimable for other reasons (e.g. swap > > quota exceeded for memcg). > > Both swap or unreclaimabed (unevicteable) is printed in OOM messages. Not really. Consider a memcg which has reached it's swap limit. The anonymous memory is not really reclaimable even when there is a lot of swap space available. > If something else can prevent the file cache being reclaimed, we'd > better show them as well. How are you going to do that? How do you track pins on pages? > > Besides that, there is the very same problem > > with the global OOM killer, right? And I do not expect we want to print > > all memcgs in the system (this might be hundreds). > > > > I forgot the global oom... > > Why not just print the memcgs which are under memory.min protection or > something like a total number of min protected memory ? Yes, this would likely help. But the main question really reamains, is this really worth it? > > > So it is better to show the memcg > > > min settings. > > > Let's take an example. > > > bar bar/memory.max = 1200M memory.min=800M > > > / \ > > > barA barB barA/memory.min = 800M memory.current=1G (file page cache) > > > barB/memory.min = 0 (process in this memcg is allocating page) > > > > > > The process will do memcg reclaim if the bar/memory.max is reached. Once > > > the barA/memory.min is reached it will stop reclaiming file page caches in > > > barA, and if there is no reclaimable pages in bar and bar/barB it will > > > enter memcg OOM then. > > > After this pacch, bellow messages will be show then (only includeing the > > > relevant messages here). The lines begin with '#' are newly added info (the > > > '#' symbol is not in the original messages). > > > memory: usage 1228800kB, limit 1228800kB, failcnt 18337 > > > ... > > > # Memory cgroup min setting: > > > # /bar: min 819200KB emin 0KB > > > # /bar/barA: min 819200KB emin 819200KB > > > # /bar/barB: min 0KB emin 0KB > > > ... > > > Memory cgroup stats for /bar: > > > anon 418328576 > > > file 835756032 > > > ... > > > unevictable 0 > > > ... > > > oom-kill:constraint=CONSTRAINT_MEMCG..oom_memcg=/bar,task_memcg=/bar/barB > > > > > > With the new added information, we can find the memory.min in bar/barA is > > > reached and the processes in bar/barB can't reclaim file page cache from > > > bar/barA any more. While without this new added information we don't know > > > why the file page cache in bar can't be reclaimed. > > > > Well, I am not sure this is really usefull enough TBH. It doesn't give > > you the whole picture and it potentially generates a lot of output in > > the oom report. FYI we used to have a more precise break down of > > counters in memcg hierarchy, see 58cf188ed649 ("memcg, oom: provide more > > precise dump info while memcg oom happening") which later got rewritten > > by c8713d0b2312 ("mm: memcontrol: dump memory.stat during cgroup OOM") > > > > At least we'd better print a total protected memory in the oom messages. > > > Could you be more specific why do you really need this piece of > > information? > > I have said in the commit log, that we don't know why the file cache > can't be reclaimed (when evictable is 0 and dirty is 0 as well.) And the counter argument is that this will not help you there much in many large and much more common cases. I argue, and I might be wrong here so feel free to correct me, that the reclaim protection guarantee (min) is something to be under admins control. It shouldn't really happen nilly-willy because it has really large consequences, the OOM including. So if there is a suspicious amount of memory that could be reclaimed normally then the reclaim protection is really the first suspect to go after.
On Wed, Nov 20, 2019 at 7:40 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Wed 20-11-19 18:53:44, Yafang Shao wrote: > > On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > On Wed 20-11-19 03:53:05, Yafang Shao wrote: > > > > A task running in a memcg may OOM because of the memory.min settings of his > > > > slibing and parent. If this happens, the current oom messages can't show > > > > why file page cache can't be reclaimed. > > > > > > min limit is not the only way to protect memory from being reclaim. The > > > memory might be pinned or unreclaimable for other reasons (e.g. swap > > > quota exceeded for memcg). > > > > Both swap or unreclaimabed (unevicteable) is printed in OOM messages. > > Not really. Consider a memcg which has reached it's swap limit. The > anonymous memory is not really reclaimable even when there is a lot of > swap space available. > The memcg swap limit is already printed in oom messages, see bellow, [ 141.721625] memory: usage 1228800kB, limit 1228800kB, failcnt 18337 [ 141.721958] swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > > If something else can prevent the file cache being reclaimed, we'd > > better show them as well. > > How are you going to do that? How do you track pins on pages? > Actually I don't have a clear idea how to track pins on pages yet. > > > Besides that, there is the very same problem > > > with the global OOM killer, right? And I do not expect we want to print > > > all memcgs in the system (this might be hundreds). > > > > > > > I forgot the global oom... > > > > Why not just print the memcgs which are under memory.min protection or > > something like a total number of min protected memory ? > > Yes, this would likely help. But the main question really reamains, is > this really worth it? > If it doesn't cost too much, I think it is worth to do it. As the oom path is not the critical path, so adding some print info should not add much overhead. > > > > So it is better to show the memcg > > > > min settings. > > > > Let's take an example. > > > > bar bar/memory.max = 1200M memory.min=800M > > > > / \ > > > > barA barB barA/memory.min = 800M memory.current=1G (file page cache) > > > > barB/memory.min = 0 (process in this memcg is allocating page) > > > > > > > > The process will do memcg reclaim if the bar/memory.max is reached. Once > > > > the barA/memory.min is reached it will stop reclaiming file page caches in > > > > barA, and if there is no reclaimable pages in bar and bar/barB it will > > > > enter memcg OOM then. > > > > After this pacch, bellow messages will be show then (only includeing the > > > > relevant messages here). The lines begin with '#' are newly added info (the > > > > '#' symbol is not in the original messages). > > > > memory: usage 1228800kB, limit 1228800kB, failcnt 18337 > > > > ... > > > > # Memory cgroup min setting: > > > > # /bar: min 819200KB emin 0KB > > > > # /bar/barA: min 819200KB emin 819200KB > > > > # /bar/barB: min 0KB emin 0KB > > > > ... > > > > Memory cgroup stats for /bar: > > > > anon 418328576 > > > > file 835756032 > > > > ... > > > > unevictable 0 > > > > ... > > > > oom-kill:constraint=CONSTRAINT_MEMCG..oom_memcg=/bar,task_memcg=/bar/barB > > > > > > > > With the new added information, we can find the memory.min in bar/barA is > > > > reached and the processes in bar/barB can't reclaim file page cache from > > > > bar/barA any more. While without this new added information we don't know > > > > why the file page cache in bar can't be reclaimed. > > > > > > Well, I am not sure this is really usefull enough TBH. It doesn't give > > > you the whole picture and it potentially generates a lot of output in > > > the oom report. FYI we used to have a more precise break down of > > > counters in memcg hierarchy, see 58cf188ed649 ("memcg, oom: provide more > > > precise dump info while memcg oom happening") which later got rewritten > > > by c8713d0b2312 ("mm: memcontrol: dump memory.stat during cgroup OOM") > > > > > > > At least we'd better print a total protected memory in the oom messages. > > > > > Could you be more specific why do you really need this piece of > > > information? > > > > I have said in the commit log, that we don't know why the file cache > > can't be reclaimed (when evictable is 0 and dirty is 0 as well.) > > And the counter argument is that this will not help you there much in > many large and much more common cases. > > I argue, and I might be wrong here so feel free to correct me, that the > reclaim protection guarantee (min) is something to be under admins > control. It shouldn't really happen nilly-willy because it has really > large consequences, the OOM including. So if there is a suspicious > amount of memory that could be reclaimed normally then the reclaim > protection is really the first suspect to go after. > -- I don't know whether it happens nilly-willy or not. But if we all know that it may cause OOMs and it don't take too much effort to show it in the OOM messages, we'd better show it. That can help the admins and the admins don't need to suspect it any more. Thanks Yafang
On Wed 20-11-19 20:23:54, Yafang Shao wrote: > On Wed, Nov 20, 2019 at 7:40 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Wed 20-11-19 18:53:44, Yafang Shao wrote: > > > On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > On Wed 20-11-19 03:53:05, Yafang Shao wrote: > > > > > A task running in a memcg may OOM because of the memory.min settings of his > > > > > slibing and parent. If this happens, the current oom messages can't show > > > > > why file page cache can't be reclaimed. > > > > > > > > min limit is not the only way to protect memory from being reclaim. The > > > > memory might be pinned or unreclaimable for other reasons (e.g. swap > > > > quota exceeded for memcg). > > > > > > Both swap or unreclaimabed (unevicteable) is printed in OOM messages. > > > > Not really. Consider a memcg which has reached it's swap limit. The > > anonymous memory is not really reclaimable even when there is a lot of > > swap space available. > > > > The memcg swap limit is already printed in oom messages, see bellow, > > [ 141.721625] memory: usage 1228800kB, limit 1228800kB, failcnt 18337 > [ 141.721958] swap: usage 0kB, limit 9007199254740988kB, failcnt 0 But you do not have any insight on the swap limit down the oom hierarchy, do you? > > > Why not just print the memcgs which are under memory.min protection or > > > something like a total number of min protected memory ? > > > > Yes, this would likely help. But the main question really reamains, is > > this really worth it? > > > > If it doesn't cost too much, I think it is worth to do it. > As the oom path is not the critical path, so adding some print info > should not add much overhead. Generating a lot of output for the oom reports has been a real problem in many deployments. [...] > > > I have said in the commit log, that we don't know why the file cache > > > can't be reclaimed (when evictable is 0 and dirty is 0 as well.) > > > > And the counter argument is that this will not help you there much in > > many large and much more common cases. > > > > I argue, and I might be wrong here so feel free to correct me, that the > > reclaim protection guarantee (min) is something to be under admins > > control. It shouldn't really happen nilly-willy because it has really > > large consequences, the OOM including. So if there is a suspicious > > amount of memory that could be reclaimed normally then the reclaim > > protection is really the first suspect to go after. > > -- > > I don't know whether it happens nilly-willy or not. It is a reclaim protection guarantee (so essentially an mlock like thing) so it better have to be properly considered when used. > But if we all know that it may cause OOMs and it don't take too much > effort to show it in the OOM messages, I do not think we are in agreement here. As mentioned above the oom report is quite heavy already. So it should be other way around. There should be a strong reason to add something more. A real use case where not having that information is making debugging ooms considerably much harder.
On Fri, Nov 22, 2019 at 6:28 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Wed 20-11-19 20:23:54, Yafang Shao wrote: > > On Wed, Nov 20, 2019 at 7:40 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > On Wed 20-11-19 18:53:44, Yafang Shao wrote: > > > > On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > > > On Wed 20-11-19 03:53:05, Yafang Shao wrote: > > > > > > A task running in a memcg may OOM because of the memory.min settings of his > > > > > > slibing and parent. If this happens, the current oom messages can't show > > > > > > why file page cache can't be reclaimed. > > > > > > > > > > min limit is not the only way to protect memory from being reclaim. The > > > > > memory might be pinned or unreclaimable for other reasons (e.g. swap > > > > > quota exceeded for memcg). > > > > > > > > Both swap or unreclaimabed (unevicteable) is printed in OOM messages. > > > > > > Not really. Consider a memcg which has reached it's swap limit. The > > > anonymous memory is not really reclaimable even when there is a lot of > > > swap space available. > > > > > > > The memcg swap limit is already printed in oom messages, see bellow, > > > > [ 141.721625] memory: usage 1228800kB, limit 1228800kB, failcnt 18337 > > [ 141.721958] swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > > But you do not have any insight on the swap limit down the oom > hierarchy, do you? > > > > > Why not just print the memcgs which are under memory.min protection or > > > > something like a total number of min protected memory ? > > > > > > Yes, this would likely help. But the main question really reamains, is > > > this really worth it? > > > > > > > If it doesn't cost too much, I think it is worth to do it. > > As the oom path is not the critical path, so adding some print info > > should not add much overhead. > > Generating a lot of output for the oom reports has been a real problem > in many deployments. So why not only print non-zero counters ? If some counters are 0, we don't print them, that can reduce the oom reports. Something like "isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0" can all be removed, and we consider them as zero by default. I mean we can optimze the OOM reports and only print the useful information to make it not be a problem in many deployments. > [...] > > > > I have said in the commit log, that we don't know why the file cache > > > > can't be reclaimed (when evictable is 0 and dirty is 0 as well.) > > > > > > And the counter argument is that this will not help you there much in > > > many large and much more common cases. > > > > > > I argue, and I might be wrong here so feel free to correct me, that the > > > reclaim protection guarantee (min) is something to be under admins > > > control. It shouldn't really happen nilly-willy because it has really > > > large consequences, the OOM including. So if there is a suspicious > > > amount of memory that could be reclaimed normally then the reclaim > > > protection is really the first suspect to go after. > > > -- > > > > I don't know whether it happens nilly-willy or not. > > It is a reclaim protection guarantee (so essentially an mlock like > thing) so it better have to be properly considered when used. > > > But if we all know that it may cause OOMs and it don't take too much > > effort to show it in the OOM messages, > > I do not think we are in agreement here. As mentioned above the oom > report is quite heavy already. So it should be other way around. There > should be a strong reason to add something more. A real use case where > not having that information is making debugging ooms considerably much > harder. > > -- > Michal Hocko > SUSE Labs
On Sat 23-11-19 13:52:57, Yafang Shao wrote: > On Fri, Nov 22, 2019 at 6:28 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Wed 20-11-19 20:23:54, Yafang Shao wrote: > > > On Wed, Nov 20, 2019 at 7:40 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > On Wed 20-11-19 18:53:44, Yafang Shao wrote: > > > > > On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > > > > > On Wed 20-11-19 03:53:05, Yafang Shao wrote: > > > > > > > A task running in a memcg may OOM because of the memory.min settings of his > > > > > > > slibing and parent. If this happens, the current oom messages can't show > > > > > > > why file page cache can't be reclaimed. > > > > > > > > > > > > min limit is not the only way to protect memory from being reclaim. The > > > > > > memory might be pinned or unreclaimable for other reasons (e.g. swap > > > > > > quota exceeded for memcg). > > > > > > > > > > Both swap or unreclaimabed (unevicteable) is printed in OOM messages. > > > > > > > > Not really. Consider a memcg which has reached it's swap limit. The > > > > anonymous memory is not really reclaimable even when there is a lot of > > > > swap space available. > > > > > > > > > > The memcg swap limit is already printed in oom messages, see bellow, > > > > > > [ 141.721625] memory: usage 1228800kB, limit 1228800kB, failcnt 18337 > > > [ 141.721958] swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > > > > But you do not have any insight on the swap limit down the oom > > hierarchy, do you? > > > > > > > Why not just print the memcgs which are under memory.min protection or > > > > > something like a total number of min protected memory ? > > > > > > > > Yes, this would likely help. But the main question really reamains, is > > > > this really worth it? > > > > > > > > > > If it doesn't cost too much, I think it is worth to do it. > > > As the oom path is not the critical path, so adding some print info > > > should not add much overhead. > > > > Generating a lot of output for the oom reports has been a real problem > > in many deployments. > > So why not only print non-zero counters ? > If some counters are 0, we don't print them, that can reduce the oom reports. > > Something like "isolated_file:0 unevictable:0 dirty:0 writeback:0 > unstable:0" can all be removed, > and we consider them as zero by default. because that would make parsing more complex. > I mean we can optimze the OOM reports and only print the useful > information to make it not be a problem in many deployments. We can, but it would be great to have it backed by som real usecase to change the current behavior. I haven't heard anything so far. It is all about "this would be nice" without a strong justification.
On Mon, Nov 25, 2019 at 4:20 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Sat 23-11-19 13:52:57, Yafang Shao wrote: > > On Fri, Nov 22, 2019 at 6:28 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > On Wed 20-11-19 20:23:54, Yafang Shao wrote: > > > > On Wed, Nov 20, 2019 at 7:40 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > > > On Wed 20-11-19 18:53:44, Yafang Shao wrote: > > > > > > On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > > > > > > > On Wed 20-11-19 03:53:05, Yafang Shao wrote: > > > > > > > > A task running in a memcg may OOM because of the memory.min settings of his > > > > > > > > slibing and parent. If this happens, the current oom messages can't show > > > > > > > > why file page cache can't be reclaimed. > > > > > > > > > > > > > > min limit is not the only way to protect memory from being reclaim. The > > > > > > > memory might be pinned or unreclaimable for other reasons (e.g. swap > > > > > > > quota exceeded for memcg). > > > > > > > > > > > > Both swap or unreclaimabed (unevicteable) is printed in OOM messages. > > > > > > > > > > Not really. Consider a memcg which has reached it's swap limit. The > > > > > anonymous memory is not really reclaimable even when there is a lot of > > > > > swap space available. > > > > > > > > > > > > > The memcg swap limit is already printed in oom messages, see bellow, > > > > > > > > [ 141.721625] memory: usage 1228800kB, limit 1228800kB, failcnt 18337 > > > > [ 141.721958] swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > > > > > > But you do not have any insight on the swap limit down the oom > > > hierarchy, do you? > > > > > > > > > Why not just print the memcgs which are under memory.min protection or > > > > > > something like a total number of min protected memory ? > > > > > > > > > > Yes, this would likely help. But the main question really reamains, is > > > > > this really worth it? > > > > > > > > > > > > > If it doesn't cost too much, I think it is worth to do it. > > > > As the oom path is not the critical path, so adding some print info > > > > should not add much overhead. > > > > > > Generating a lot of output for the oom reports has been a real problem > > > in many deployments. > > > > So why not only print non-zero counters ? > > If some counters are 0, we don't print them, that can reduce the oom reports. > > > > Something like "isolated_file:0 unevictable:0 dirty:0 writeback:0 > > unstable:0" can all be removed, > > and we consider them as zero by default. > > because that would make parsing more complex. > > > I mean we can optimze the OOM reports and only print the useful > > information to make it not be a problem in many deployments. > > We can, but it would be great to have it backed by som real usecase to > change the current behavior. I haven't heard anything so far. It is all > about "this would be nice" without a strong justification. Because I was told by you that "Generating a lot of output for the oom reports has been a real problem in many deployments.". Maybe I misunderstood you : ( Thanks Yafang
On Mon 25-11-19 17:12:54, Yafang Shao wrote: > On Mon, Nov 25, 2019 at 4:20 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Sat 23-11-19 13:52:57, Yafang Shao wrote: > > > On Fri, Nov 22, 2019 at 6:28 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > On Wed 20-11-19 20:23:54, Yafang Shao wrote: > > > > > On Wed, Nov 20, 2019 at 7:40 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > > > > > On Wed 20-11-19 18:53:44, Yafang Shao wrote: > > > > > > > On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > > > > > > > > > On Wed 20-11-19 03:53:05, Yafang Shao wrote: > > > > > > > > > A task running in a memcg may OOM because of the memory.min settings of his > > > > > > > > > slibing and parent. If this happens, the current oom messages can't show > > > > > > > > > why file page cache can't be reclaimed. > > > > > > > > > > > > > > > > min limit is not the only way to protect memory from being reclaim. The > > > > > > > > memory might be pinned or unreclaimable for other reasons (e.g. swap > > > > > > > > quota exceeded for memcg). > > > > > > > > > > > > > > Both swap or unreclaimabed (unevicteable) is printed in OOM messages. > > > > > > > > > > > > Not really. Consider a memcg which has reached it's swap limit. The > > > > > > anonymous memory is not really reclaimable even when there is a lot of > > > > > > swap space available. > > > > > > > > > > > > > > > > The memcg swap limit is already printed in oom messages, see bellow, > > > > > > > > > > [ 141.721625] memory: usage 1228800kB, limit 1228800kB, failcnt 18337 > > > > > [ 141.721958] swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > > > > > > > > But you do not have any insight on the swap limit down the oom > > > > hierarchy, do you? > > > > > > > > > > > Why not just print the memcgs which are under memory.min protection or > > > > > > > something like a total number of min protected memory ? > > > > > > > > > > > > Yes, this would likely help. But the main question really reamains, is > > > > > > this really worth it? > > > > > > > > > > > > > > > > If it doesn't cost too much, I think it is worth to do it. > > > > > As the oom path is not the critical path, so adding some print info > > > > > should not add much overhead. > > > > > > > > Generating a lot of output for the oom reports has been a real problem > > > > in many deployments. > > > > > > So why not only print non-zero counters ? > > > If some counters are 0, we don't print them, that can reduce the oom reports. > > > > > > Something like "isolated_file:0 unevictable:0 dirty:0 writeback:0 > > > unstable:0" can all be removed, > > > and we consider them as zero by default. > > > > because that would make parsing more complex. > > > > > I mean we can optimze the OOM reports and only print the useful > > > information to make it not be a problem in many deployments. > > > > We can, but it would be great to have it backed by som real usecase to > > change the current behavior. I haven't heard anything so far. It is all > > about "this would be nice" without a strong justification. > > Because I was told by you that "Generating a lot of output for the oom > reports has been a real problem in many deployments.". > Maybe I misunderstood you : ( I meant to say that we do not want to add more output unless there is a real need. Changing the existing output is usually tricky because there are existing parsers that might break on that change. Not that we do guarantee the output format officially but we try hard to preserve as much as we can if that is possible. That being said maybe the lack of reclaim guarantee information is a more wide spread problem and more people are suffering from it. If that is the case then the justification would be much easier to formulate.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1c4c08b..97fdd93 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1519,16 +1519,27 @@ void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct * */ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) { + struct mem_cgroup *iter; char *buf; pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n", K((u64)page_counter_read(&memcg->memory)), K((u64)memcg->memory.max), memcg->memory.failcnt); - if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) + if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) { pr_info("swap: usage %llukB, limit %llukB, failcnt %lu\n", K((u64)page_counter_read(&memcg->swap)), K((u64)memcg->swap.max), memcg->swap.failcnt); - else { + + pr_info("Memory cgroup min setting:\n"); + for_each_mem_cgroup_tree(iter, memcg) { + pr_cont_cgroup_path(iter->css.cgroup); + pr_cont(":"); + + pr_cont(" min %lluKB emin %lluKB\n", + K((u64)iter->memory.min), + K((u64)iter->memory.emin)); + } + } else { pr_info("memory+swap: usage %llukB, limit %llukB, failcnt %lu\n", K((u64)page_counter_read(&memcg->memsw)), K((u64)memcg->memsw.max), memcg->memsw.failcnt);
A task running in a memcg may OOM because of the memory.min settings of his slibing and parent. If this happens, the current oom messages can't show why file page cache can't be reclaimed. So it is better to show the memcg min settings. Let's take an example. bar bar/memory.max = 1200M memory.min=800M / \ barA barB barA/memory.min = 800M memory.current=1G (file page cache) barB/memory.min = 0 (process in this memcg is allocating page) The process will do memcg reclaim if the bar/memory.max is reached. Once the barA/memory.min is reached it will stop reclaiming file page caches in barA, and if there is no reclaimable pages in bar and bar/barB it will enter memcg OOM then. After this pacch, bellow messages will be show then (only includeing the relevant messages here). The lines begin with '#' are newly added info (the '#' symbol is not in the original messages). memory: usage 1228800kB, limit 1228800kB, failcnt 18337 ... # Memory cgroup min setting: # /bar: min 819200KB emin 0KB # /bar/barA: min 819200KB emin 819200KB # /bar/barB: min 0KB emin 0KB ... Memory cgroup stats for /bar: anon 418328576 file 835756032 ... unevictable 0 ... oom-kill:constraint=CONSTRAINT_MEMCG..oom_memcg=/bar,task_memcg=/bar/barB With the new added information, we can find the memory.min in bar/barA is reached and the processes in bar/barB can't reclaim file page cache from bar/barA any more. While without this new added information we don't know why the file page cache in bar can't be reclaimed. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> --- mm/memcontrol.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)