From patchwork Mon Oct 18 08:14:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vasily Averin X-Patchwork-Id: 12565309 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBC45C433EF for ; Mon, 18 Oct 2021 08:14:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 683086127B for ; Mon, 18 Oct 2021 08:14:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 683086127B Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=virtuozzo.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 040246B0071; Mon, 18 Oct 2021 04:14:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F31156B0072; Mon, 18 Oct 2021 04:14:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E20296B0073; Mon, 18 Oct 2021 04:14:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0130.hostedemail.com [216.40.44.130]) by kanga.kvack.org (Postfix) with ESMTP id D5B546B0071 for ; Mon, 18 Oct 2021 04:14:36 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8D2308249980 for ; Mon, 18 Oct 2021 08:14:36 +0000 (UTC) X-FDA: 78708846552.09.4EAC1A0 Received: from relay.sw.ru (relay.sw.ru [185.231.240.75]) by imf04.hostedemail.com (Postfix) with ESMTP id 423F150000A4 for ; Mon, 18 Oct 2021 08:14:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=Content-Type:MIME-Version:Date:Message-ID:Subject :From; bh=o6kTDCIX8m/7vi/Fj7ESJCu9Hrbtu/S6QWfa9zBOu9w=; b=Yui58D6boe8f6wHqXme h9QuiSqbjd6C/vRXzdxWFiv5XKKzCzsSb+sDTmMl1DAPFXjkGljm/MRffisA8iVymhySz/zoc+Gim QdMZOSg101MvhtW6KSn6qsBOV5eyql/UKp9ozis5XS07u/Ny2fKMdLtfOBFNrdme+wIaZKCAbiM=; Received: from [172.29.1.17] by relay.sw.ru with esmtp (Exim 4.94.2) (envelope-from ) id 1mcNmz-006KDj-0X; Mon, 18 Oct 2021 11:14:33 +0300 From: Vasily Averin Subject: [PATCH memcg 1/1] memcg: prevent false global OOM triggered by memcg limited task To: Michal Hocko , Johannes Weiner , Vladimir Davydov , Andrew Morton Cc: Roman Gushchin , Uladzislau Rezki , Vlastimil Babka , Shakeel Butt , Mel Gorman , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel@openvz.org References: Message-ID: Date: Mon, 18 Oct 2021 11:14:18 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 423F150000A4 X-Stat-Signature: a9fphwhoi6f7yrsyns99igbnc1wk173u Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=virtuozzo.com header.s=relay header.b=Yui58D6b; spf=pass (imf04.hostedemail.com: domain of vvs@virtuozzo.com designates 185.231.240.75 as permitted sender) smtp.mailfrom=vvs@virtuozzo.com; dmarc=pass (policy=quarantine) header.from=virtuozzo.com X-HE-Tag: 1634544874-211293 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently memcg-limited userspace can trigger false global OOM. A user space task inside memcg-limited container generates a page fault, its handler do_user_addr_fault() calls handle_mm_fault(), which cannot allocate the page due to exceeding the memcg limit and returns VM_FAULT_OOM. Then do_user_addr_fault() calls pagefault_out_of_memory() which finally executes out_of_memory() without set of memcg and triggers a false global OOM. At present do_user_addr_fault() does not know why page allocation was failed, i.e. was it global or memcg OOM. Let's use new flag on task struct to save this information, it will be set in obj_cgroup_charge_pages (for memory controller) and in try_charge_memcg (for kmem controller), and will be used in mem_cgroup_oom_synchronize() called inside pagefault_out_of_memory(): in case of memcg-related restrictions it does not allow to generate a false global OOM and will silently return to user space which will either retry the fault or kill the process if it got a fatal signal. Signed-off-by: Vasily Averin --- include/linux/sched.h | 1 + mm/memcontrol.c | 12 +++++++++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index c1a927ddec64..62d186fffb26 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -910,6 +910,7 @@ struct task_struct { #endif #ifdef CONFIG_MEMCG unsigned in_user_fault:1; + unsigned is_over_memcg_limit:1; #endif #ifdef CONFIG_COMPAT_BRK unsigned brk_randomized:1; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 87e41c3cac10..c977d75bcc5f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1846,7 +1846,7 @@ bool mem_cgroup_oom_synchronize(bool handle) /* OOM is global, do not handle */ if (!memcg) - return false; + return current->is_over_memcg_limit; if (!handle) goto cleanup; @@ -2535,6 +2535,8 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, bool drained = false; unsigned long pflags; + if (current->in_user_fault) + current->is_over_memcg_limit = false; retry: if (consume_stock(memcg, nr_pages)) return 0; @@ -2639,8 +2641,11 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, goto retry; } nomem: - if (!(gfp_mask & __GFP_NOFAIL)) + if (!(gfp_mask & __GFP_NOFAIL)) { + if (current->in_user_fault) + current->is_over_memcg_limit = true; return -ENOMEM; + } force: /* * The allocation either can't fail or will lead to more memory @@ -2964,10 +2969,11 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp, } cancel_charge(memcg, nr_pages); ret = -ENOMEM; + if (current->in_user_fault) + current->is_over_memcg_limit = true; } out: css_put(&memcg->css); - return ret; }