From patchwork Fri Jul 13 23:07:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Rientjes X-Patchwork-Id: 10524243 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 12B79602B3 for ; Fri, 13 Jul 2018 23:07:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 01CF329B44 for ; Fri, 13 Jul 2018 23:07:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E98D029B58; Fri, 13 Jul 2018 23:07:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 68C3F29B44 for ; Fri, 13 Jul 2018 23:07:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DFD256B026D; Fri, 13 Jul 2018 19:07:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DAD5E6B026F; Fri, 13 Jul 2018 19:07:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C26336B0270; Fri, 13 Jul 2018 19:07:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf0-f197.google.com (mail-pf0-f197.google.com [209.85.192.197]) by kanga.kvack.org (Postfix) with ESMTP id 7FACB6B026D for ; Fri, 13 Jul 2018 19:07:36 -0400 (EDT) Received: by mail-pf0-f197.google.com with SMTP id a23-v6so9919628pfo.23 for ; Fri, 13 Jul 2018 16:07:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:date:from:to:cc:subject :in-reply-to:message-id:references:user-agent:mime-version; bh=epRZbX2MzRqd+YuWxHaOlHsDt6hjooJyagwCrO6Tw7s=; b=q5RZuo+2vkYOrBLi//XqLwfvNxFBWin/SIHaqhO1XJw3kOGrBNUo/MmqgqBEck/z7f /LSIl19nBbAX2CgiwvTWZ41LXwcZFz0ELwkva5noJWcdNT/0jByKZoKlt1rYIvmXy8zZ XnE2JwpTjQXMJ5PpVJyHxyqF6ThZSzwEZ5gNNFXWIwLpIMfPMvG/78fCA2WIgvPlZ2Ds 1nkP4HoK52gGg7LD9xp8GW0i8EHmLMTRnQW3UTXzU2ecjvdTRt4VoanQIj3lKppmZ7sJ 01VmMIc+0fxN7zejf0fUxP3LoRgEgqOrlMQIb/iaq6ApQUHK9KCRJP9XPwDkXCRpP3D2 VNHA== X-Gm-Message-State: AOUpUlGSSsKK1zH9SK5dJKsvolEa8TsaBU9K89tXFBmFQ1ZW50K7yarl qkDab5tA67fAKR42hibt6+ZwsnxUVnSPvl9Z4RqkykZqFAJHfLEgxxbAUewuPJDdUXAnh+rpN/x mNihGHwiR+4Po4pErQ+qyIclIBI5OKqSUXk6tuaDV9mjF35ETvpBH8fgA7f5oVnCc6057UrUdrQ 7Z/3Vu9UUZcFeY812ETPo/xpkDr7ccRSPc/A4LOdxSJOdFxP3ORLVZHCs0EijSq+vDTeVK77K8f BeaH55Gpb+eRtzdcsqDaVXE9t28T7xXfi49MrTpDYWrtTkV26lYSb1aiTq6hTJe3FtrS7Ekjvdr MGgzWP97IYKtlHBPFihB1IBtfeu5IokZs0FCUl5/IH4ZOvbfaUZrSRxactdHEfLHN435AA+oyAZ 1 X-Received: by 2002:a63:9802:: with SMTP id q2-v6mr1844223pgd.70.1531523256206; Fri, 13 Jul 2018 16:07:36 -0700 (PDT) X-Received: by 2002:a63:9802:: with SMTP id q2-v6mr1844197pgd.70.1531523255480; Fri, 13 Jul 2018 16:07:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531523255; cv=none; d=google.com; s=arc-20160816; b=luaouhkGzVi8eGYjvGMzGsPYqPoUAEGqNhu8tbLPDevvyKkEEXXU8yTMcyJSS9nwXL mADsDWtSWdZpJ6wRwQnQ085qo8IbgaAoa+YJ7PnVro688Hpppbcsmz1uu10A4lfG9k0N vZf9YZy3aNpaZMaVhnDsIRoM6E5zlsi4zmcueTHKV51hsuYTz/5y4TVHnhOXghvBmbSm bs+vS2EbN+Qlu4NXBwo0zv3M0zUgPi7G2h5HBIAeLIdR/BLKNfKW4tv1ShbLgfhIb3nt aTpZJ8dVAr1WJdfbXMum52MUmNmwVQDvBYuuxtnSlkANSgffR6hB2Y9pNXzM6Ct7XNSa CeLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:references:message-id:in-reply-to:subject :cc:to:from:date:dkim-signature:arc-authentication-results; bh=epRZbX2MzRqd+YuWxHaOlHsDt6hjooJyagwCrO6Tw7s=; b=mDKngaFINwbfigtO8l8OxSBEfhn5qBXbqiRBmL54qjSXJXlWJW08kwMYJOWoCVNbKI JkIrR7jxg9Bq3xxmyYjbDvj/ihF1qgJBTe8/gMravFA4e6m3pRt//pnILVbtDK3EGudw eARoeNwy7HmOqz99Oo+xytHMOI23Nj7+z/0Ua8zjP0Ou0XJ5+F4M5dpExWdQJY5Zy+Rr 1J/lwzrs+t6Cwaz0A1UnENNtsCG/1Nq+61s7p80mtQacGiunLNs8vPdyno5YTZ99DUMj W0VnAJVRcNu9F1eUwoOpJZTsRkbRgF+SZCE4f1oubrp5IIgedo2eyByIm6/KQ/LrBjuO jk6A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=rbbTv0pV; spf=pass (google.com: domain of rientjes@google.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id e91-v6sor9027053plb.66.2018.07.13.16.07.35 for (Google Transport Security); Fri, 13 Jul 2018 16:07:35 -0700 (PDT) Received-SPF: pass (google.com: domain of rientjes@google.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=rbbTv0pV; spf=pass (google.com: domain of rientjes@google.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=epRZbX2MzRqd+YuWxHaOlHsDt6hjooJyagwCrO6Tw7s=; b=rbbTv0pVdOMpkx6IDJePGKTtbXOpoY94OgqpCA9Sv8VNRdLlTS8Q4PH+wyvf4cJbQ0 Ml//JNgvPGZeftKDb2vEQ4bU9J/peofq60rEXlFLnNqXtR6jIITD2OrGZhzOB64SR0lq fuPQLQPka3HJJxO2bmEM9Fi1utZJJAnBKNl4iVa6ooEgqJQ9ZIZZm0TKEXbOPzt9c0Mo 5kLGWZnlyci31B8wEm+R/ZkI+EPUcXrUZu+gq2T+FoOCpLKLoP3JjOR/qmTmrM0SvAXn jY3Q8YXUkspZwvGtfmm0eezEzUdgwWOplAq3hvXUljbrvOXMUjRZavpJx+hBjnFK/o1N /gpg== X-Google-Smtp-Source: AAOMgpdYvAytsViqbRSAPC2ly5AlcP1UNxwxhKWzPSJyPUJSur2sQ978iDeJPwfL7iU13kI6Bhl0Xg== X-Received: by 2002:a17:902:8c88:: with SMTP id t8-v6mr8097123plo.117.1531523254926; Fri, 13 Jul 2018 16:07:34 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id w25-v6sm3397610pfg.136.2018.07.13.16.07.34 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 13 Jul 2018 16:07:34 -0700 (PDT) Date: Fri, 13 Jul 2018 16:07:33 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Andrew Morton , Roman Gushchin cc: Michal Hocko , Vladimir Davydov , Johannes Weiner , Tejun Heo , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [patch v3 -mm 5/6] mm, memcg: separate oom_group from selection criteria In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP With the current implementation of the cgroup-aware oom killer, memory.oom_group defines two behaviors: - consider the footprint of the "group" consisting of the mem cgroup itself and all descendants for comparison with other cgroups, and - when selected as the victim mem cgroup, kill all processes attached to it and its descendants that are eligible to be killed. Now that the memory.oom_policy of "tree" considers the memory footprint of the mem cgroup and all its descendants, separate the memory.oom_group setting from the selection criteria. Now, memory.oom_group only controls whether all processes attached to the victim mem cgroup and its descendants are oom killed (when set to "1") or the single largest memory consuming process attached to the victim mem cgroup and its descendants is killed. This is generally regarded as a property of the workload attached to the subtree: it depends on whether the workload can continue running and be useful if a single process is oom killed or whether it's better to kill all attached processes. Signed-off-by: David Rientjes --- Documentation/admin-guide/cgroup-v2.rst | 21 ++++----------------- mm/memcontrol.c | 8 ++++---- 2 files changed, 8 insertions(+), 21 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1078,25 +1078,12 @@ PAGE_SIZE multiple when read back. A read-write single value file which exists on non-root cgroups. The default is "0". - If set, OOM killer will consider the memory cgroup as an - indivisible memory consumers and compare it with other memory - consumers by it's memory footprint. - If such memory cgroup is selected as an OOM victim, all - processes belonging to it or it's descendants will be killed. + If such memory cgroup is selected as an OOM victim, all processes + attached to it and its descendants that are eligible for oom kill + (their /proc/pid/oom_score_adj is not oom disabled) will be killed. This applies to system-wide OOM conditions and reaching the hard memory limit of the cgroup and their ancestor. - If OOM condition happens in a descendant cgroup with it's own - memory limit, the memory cgroup can't be considered - as an OOM victim, and OOM killer will not kill all belonging - tasks. - - Also, OOM killer respects the /proc/pid/oom_score_adj value -1000, - and will never kill the unkillable task, even if memory.oom_group - is set. - - If cgroup-aware OOM killer is not enabled, ENOTSUPP error - is returned on attempt to access the file. memory.oom_policy @@ -1379,7 +1366,7 @@ When selecting a cgroup as a victim, the OOM killer will kill the process with the largest memory footprint. A user can control this behavior by enabling the per-cgroup memory.oom_group option. If set, it causes the OOM killer to kill all processes attached to the cgroup, except processes -with /proc/pid/oom_score_adj set to -1000 (oom disabled). +with /proc/pid/oom_score_adj set to OOM_SCORE_ADJ_MIN. The root cgroup is treated as a leaf memory cgroup as well, so it is compared with other leaf memory cgroups. diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2943,11 +2943,11 @@ static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) continue; /* - * We don't consider non-leaf non-oom_group memory cgroups - * without the oom policy of "tree" as OOM victims. + * We don't consider non-leaf memory cgroups without the oom + * policy of "tree" as OOM victims. */ - if (memcg_has_children(iter) && !mem_cgroup_oom_group(iter) && - iter->oom_policy != MEMCG_OOM_POLICY_TREE) + if (iter->oom_policy != MEMCG_OOM_POLICY_TREE && + memcg_has_children(iter)) continue; /*