From patchwork Mon Oct 18 08:13:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vasily Averin X-Patchwork-Id: 12565307 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30A7AC433FE for ; Mon, 18 Oct 2021 08:14:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8E64C60ED4 for ; Mon, 18 Oct 2021 08:14:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 8E64C60ED4 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=virtuozzo.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 262536B006C; Mon, 18 Oct 2021 04:14:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 211FC6B0071; Mon, 18 Oct 2021 04:14:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1015F900002; Mon, 18 Oct 2021 04:14:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0103.hostedemail.com [216.40.44.103]) by kanga.kvack.org (Postfix) with ESMTP id F12726B006C for ; Mon, 18 Oct 2021 04:14:21 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id A4C33181AEF30 for ; Mon, 18 Oct 2021 08:14:21 +0000 (UTC) X-FDA: 78708845922.03.C458C06 Received: from relay.sw.ru (relay.sw.ru [185.231.240.75]) by imf02.hostedemail.com (Postfix) with ESMTP id DE6C87001A05 for ; Mon, 18 Oct 2021 08:14:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=Content-Type:MIME-Version:Date:Message-ID:Subject :From; bh=GYG58p+kIP8QJCvrs6C7PPBCNIp2g2baPPfvIf0sM94=; b=ydlzBiloLw6QdeZAOK1 ehL/Et+Hf84pb/GaWU3hU9nqB5nTgYBQW2arSvVbuBjWWH+lChGss4tD8YUJrZXdkwv/x77XtKUrq jUmneXPpelfB99osmd68ge5bWkuUo7CUYWiZpxYAG3Uz6B1o0gq5aU80JqMk6cTKrLQw1QqWU+Y=; Received: from [172.29.1.17] by relay.sw.ru with esmtp (Exim 4.94.2) (envelope-from ) id 1mcNmf-006KDb-8G; Mon, 18 Oct 2021 11:14:13 +0300 From: Vasily Averin Subject: [PATCH memcg 0/1] false global OOM triggered by memcg-limited task To: Michal Hocko , Johannes Weiner , Vladimir Davydov , Andrew Morton Cc: Roman Gushchin , Uladzislau Rezki , Vlastimil Babka , Shakeel Butt , Mel Gorman , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel@openvz.org Message-ID: <9d10df01-0127-fb40-81c3-cc53c9733c3e@virtuozzo.com> Date: Mon, 18 Oct 2021 11:13:52 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 Content-Language: en-US X-Rspamd-Queue-Id: DE6C87001A05 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=virtuozzo.com header.s=relay header.b=ydlzBilo; spf=pass (imf02.hostedemail.com: domain of vvs@virtuozzo.com designates 185.231.240.75 as permitted sender) smtp.mailfrom=vvs@virtuozzo.com; dmarc=pass (policy=quarantine) header.from=virtuozzo.com X-Stat-Signature: ybsf76fyaxzj6bo31y6dsiezhx7surin X-Rspamd-Server: rspam05 X-HE-Tag: 1634544858-846860 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: While checking the patches fixed broken memcg accounting in vmalloc I found another issue: a false global OOM triggered by memcg-limited user space task. I executed vmalloc-eater inside a memcg limited LXC container in a loop, checked that it does not consume host memory beyond the assigned limit, triggers memcg OOM and generates "Memory cgroup out of memory" messages. Everything was as expected. However I was surprised to find quite rare global OOM messages too. I set sysctl vm.panic_on_oom to 1, repeated the test and successfully crashed the node. Dmesg showed that global OOM was detected on 16 GB node with ~10 GB of free memory. syz-executor invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=1000 CPU: 2 PID: 15307 Comm: syz-executor Kdump: loaded Not tainted 5.15.0-rc4+ #55 Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.4 04/01/2014 Call Trace: dump_stack_lvl+0x57/0x72 dump_header+0x4a/0x2c1 out_of_memory.cold+0xa/0x7e pagefault_out_of_memory+0x46/0x60 exc_page_fault+0x79/0x2b0 asm_exc_page_fault+0x1e/0x30 ... Mem-Info: Node 0 DMA: 0*4kB 0*8kB <...> = 13296kB Node 0 DMA32: 705*4kB (UM) <...> = 2586964kB Node 0 Normal: 2743*4kB (UME) <...> = 6904828kB ... 4095866 pages RAM ... Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled Full dmesg can be found in attached file. How could this happen? User-space task inside the memcg-limited container generated a page fault, its handler do_user_addr_fault() called handle_mm_fault which could not allocate the page due to exceeding the memcg limit and returned VM_FAULT_OOM. Then do_user_addr_fault() called pagefault_out_of_memory() which executed out_of_memory() without set of memcg. Partially this problem depends on one of my recent patches, disabled unlimited memory allocation for dying tasks. However I think the problem can happen on non-killed tasks too, for example because of kmem limit. At present do_user_addr_fault() does not know why page allocation was failed, i.e. was it global or memcg OOM. I propose to save this information in new flag on task_struct. It can be set in case of memcg restrictons in obj_cgroup_charge_pages() (for memory controller) and in try_charge_memcg() (for kmem controller). Then it can be used in mem_cgroup_oom_synchronize() called inside pagefault_out_of_memory(): in case of memcg-related restrictions it will not trigger fake global OOM and returns to user space which will retry the fault or kill the process if it got a fatal signal. Thank you, Vasily Averin Vasily Averin (1): memcg: prevent false global OOM trigggerd by memcg limited task. include/linux/sched.h | 1 + mm/memcontrol.c | 12 +++++++++--- 2 files changed, 10 insertions(+), 3 deletions(-)