From patchwork Thu Dec 12 16:57:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13905658 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69718E7717F for ; Thu, 12 Dec 2024 16:59:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE2106B0089; Thu, 12 Dec 2024 11:59:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E90DC6B008A; Thu, 12 Dec 2024 11:59:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D58386B008C; Thu, 12 Dec 2024 11:59:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B54236B0089 for ; Thu, 12 Dec 2024 11:59:18 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 4778CA057F for ; Thu, 12 Dec 2024 16:59:18 +0000 (UTC) X-FDA: 82886916588.27.347F2BB Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf30.hostedemail.com (Postfix) with ESMTP id D4DC680008 for ; Thu, 12 Dec 2024 16:58:29 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734022745; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=wyXPnYlfyMclao8/up8MciIGvADZm7mIOFCXBE5GkzI=; b=QC0F8lkvED6MFEUYkDBaUvpp/FxGPFeByAWhmah7eVXsuGG+QTcanHqgNUzDjLDBtr7vVH B/1L9h2nppzH9biQhxUmx1XLOxczgsr94/Hg6q9IiMxn1A0JletIf64JrA2JMc5uMmjyo3 /qx2YIJb9cIiUCc28ecfH3896A1OFJE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734022745; a=rsa-sha256; cv=none; b=ov/12+tO7BPbsGBjrjFhuMM/eKJbhPW9wO2GnaLbGY5A93K0wffEh8XrurJ2ReGEDpuljR kIqr8KtBzKlBet7flL2fHUaDLAeBT2IAz2OzRpV6AOh+t+gnbj22TP16R5odQKtqdWElrH y/HxiJkUKR+hnqRjBNs2BxpIIlR7uRs= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com Received: from [2601:18c:9101:a8b6:82e7:cf5d:dfd9:50ef] (helo=fangorn) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tLmVe-0000000073q-41Zi; Thu, 12 Dec 2024 11:57:55 -0500 Date: Thu, 12 Dec 2024 11:57:54 -0500 From: Rik van Riel To: Yosry Ahmed Cc: Balbir Singh , Johannes Weiner , Michal Hocko , Roman Gushchin , hakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Nhat Pham Subject: [PATCH v2] memcg: allow exiting tasks to write back data to swap Message-ID: <20241212115754.38f798b3@fangorn> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-redhat-linux-gnu) MIME-Version: 1.0 X-Stat-Signature: 3i36339t1bsezxquesubg1zzrc8qio5s X-Rspamd-Queue-Id: D4DC680008 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1734022709-101712 X-HE-Meta: U2FsdGVkX19joDJmtp+CQJQynIjRk0jRwBobmK8+5uW4tzFaYVrG5R6IO1p/+dLAjA50lpXKZYlk4k2mlW0qhnosR/4I3ZPTJY6+Kl0QJ3OjIPUfCfq/F+WG73ZuIf1pYeKClUozmY9cF2MfUfhOZua3Na6n25XWLZFv3FtoZVPBn6TpLUKaXKPNLRogwdNJTOVCc01BWVnSZ/AW/FaUrM97KWG8/7ttN7eKv5CQdqkjYRpz4um/DK3QcTUnW/f0WXyxNpneJV94qOSRJUVAhzFsrbFW4SNQQAe/JvsqESLhyW2PeR8jnU3XndOO5nj7Ib/muJpjkhYUH2T+ooOVAG4iEzC7YP6w0hRCpDSZkw1WRee0IF6tpBMqnO3IiPQ1uWZmOwbLD9T4eZBjDbHUlj2BF6kQxkpS/ke41AuXyrUCelDscy/bTpST86C8CWdQ2Z8uetlVJlrJD1RdZYistdtbtb9YTTJQeZwcdopvJsPb0lNT46frRt/EXBzD+/XyOcv6phcY22BMWD5uWkThm/tvRdka50cNidcJl1kyFIi7ikbKsXapmFDXVA3nx5i4hrL9lpeY5EBW5AIw1+ix279XIO+Hhe1jBZFVIZAYObJZoXBonWJxr7B7xw7m+hfKLfl4U+vta/RNzQFgOPZnksLgI7P9lww3rBqTeK+YRFCYpClFKchUc7arzg4IK5nJQwF5Yq1FM/MblkxiwR/oESx0Tumig2GscuA4XCeWFHfVFUZ7LN7KdJsLiYtlh7BdHijFagXWbAIl04sBt1FLvj3LleTUdpmFm5jVxnBnILdWPcyNHzKE/RFV7bocGND6rlhabhtN5UaE+ZEcyBdWLc1WdBRGH551qpFqautUcCQTzwZSy1x+Jhobo6BLXHEV5I9eh51xqMcWSxYzI6acRLy2OYOFxfrCDaVI9RvWhSfKfivLLy1qNKkC9U6AItAgulwLhQ7JCt3KGJWviH+ mkwNIrQa 1rEF2j4fjTxi2zp+Bqxnvgp41TC8u+QNZejTcwAgjY06swfSoYJvjfbVXK91IJYPqwJJ7VaU28MkGu7q6HPG+g86dDfMkotDi58/yFbaDaMfFi6XthktPl1Lhv++U0fzlbc6dVA8D/hfcFuwFGOdWrctdgFbRLVb+Tk82cpP+0orCuj+HnLhNMQa8U2NnlNY4F7SUzvKCGGzwiiXYmtjG0jqPwcQ0a8Bb2e6J0ugTVeSrdukZRi+nskRWOGGTEHKOuxhbk7k3b5Q6DRUzEptPjT8TUUaNFDUSt0iowtYXXTKH2ALlpdrKAgkumMPXJAStLlmsB54Zo2B2tncBiC8idDl1jw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A task already in exit can get stuck trying to allocate pages, if its cgroup is at the memory.max limit, the cgroup is using zswap, but zswap writeback is enabled, and the remaining memory in the cgroup is not compressible. This seems like an unlikely confluence of events, but it can happen quite easily if a cgroup is OOM killed due to exceeding its memory.max limit, and all the tasks in the cgroup are trying to exit simultaneously. When this happens, it can sometimes take hours for tasks to exit, as they are all trying to squeeze things into zswap to bring the group's memory consumption below memory.max. Allowing these exiting programs to push some memory from their own cgroup into swap allows them to quickly bring the cgroup's memory consumption below memory.max, and exit in seconds rather than hours. Signed-off-by: Rik van Riel Signed-off-by: Rik van Riel --- v2: use mm_match_cgroup as suggested by Yosry Ahmed mm/memcontrol.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7b3503d12aaf..ba1cd9c04a02 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5371,6 +5371,18 @@ bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) if (!zswap_is_enabled()) return true; + /* + * Always allow exiting tasks to push data to swap. A process in + * the middle of exit cannot get OOM killed, but may need to push + * uncompressible data to swap in order to get the cgroup memory + * use below the limit, and make progress with the exit. + */ + if (unlikely(current->flags & PF_EXITING)) { + struct mm_struct *mm = READ_ONCE(current->mm); + if (mm && mm_match_cgroup(mm, memcg)) + return true; + } + for (; memcg; memcg = parent_mem_cgroup(memcg)) if (!READ_ONCE(memcg->zswap_writeback)) return false;