Message ID | 20181004214050.7417-1-guro@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] mm: don't raise MEMCG_OOM event due to failed high-order allocation | expand |
On Thu, Oct 04, 2018 at 09:41:09PM +0000, Roman Gushchin wrote: > I was reported that on some of our machines containers were restarted > with OOM symptoms without an obvious reason. Despite there were almost > no memory pressure and plenty of page cache, MEMCG_OOM event was > raised occasionally, causing the container management software to > think, that OOM has happened. However, no tasks have been killed. > > The following investigation showed that the problem is caused by > a failing attempt to charge a high-order page. In such case, the > OOM killer is never invoked. As shown below, it can happen under > conditions, which are very far from a real OOM: e.g. there is plenty > of clean page cache and no memory pressure. > > There is no sense in raising an OOM event in this case, as it might > confuse a user and lead to wrong and excessive actions (e.g. restart > the workload, as in my case). > > Let's look at the charging path in try_charge(). If the memory usage > is about memory.max, which is absolutely natural for most memory cgroups, > we try to reclaim some pages. Even if we were able to reclaim > enough memory for the allocation, the following check can fail due to > a race with another concurrent allocation: > > if (mem_cgroup_margin(mem_over_limit) >= nr_pages) > goto retry; > > For regular pages the following condition will save us from triggering > the OOM: > > if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER)) > goto retry; > > But for high-order allocation this condition will intentionally fail. > The reason behind is that we'll likely fall to regular pages anyway, > so it's ok and even preferred to return ENOMEM. > > In this case the idea of raising MEMCG_OOM looks dubious. > > Fix this by moving MEMCG_OOM raising to mem_cgroup_oom() after > allocation order check, so that the event won't be raised for high > order allocations. This change doesn't affect regular pages allocation > and charging. > > Signed-off-by: Roman Gushchin <guro@fb.com> > Acked-by: David Rientjes <rientjes@google.com> > Acked-by: Michal Hocko <mhocko@kernel.org> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 8389d6f72a77..8384c681a4b2 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1133,6 +1133,10 @@ PAGE_SIZE multiple when read back. disk readahead. For now OOM in memory cgroup kills tasks iff shortage has happened inside page fault. + This event is not raised if the OOM killer is not + considered as an option, e.g. for failed high-order + allocations. + oom_kill The number of processes belonging to this cgroup killed by any kind of OOM killer. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7bebe2ddec05..81b47d0b14d7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1669,6 +1669,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int if (order > PAGE_ALLOC_COSTLY_ORDER) return OOM_SKIPPED; + memcg_memory_event(memcg, MEMCG_OOM); + /* * We are in the middle of the charge context here, so we * don't want to block when potentially sitting on a callstack @@ -2250,8 +2252,6 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, if (fatal_signal_pending(current)) goto force; - memcg_memory_event(mem_over_limit, MEMCG_OOM); - /* * keep retrying as long as the memcg oom killer is able to make * a forward progress or bypass the charge if the oom killer