From patchwork Wed Dec 11 15:53:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13903701 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BED7AE7717D for ; Wed, 11 Dec 2024 15:54:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 31FE56B008C; Wed, 11 Dec 2024 10:54:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D0476B0092; Wed, 11 Dec 2024 10:54:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 198A06B0093; Wed, 11 Dec 2024 10:54:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EC7406B008C for ; Wed, 11 Dec 2024 10:54:57 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A5F7FC1235 for ; Wed, 11 Dec 2024 15:54:57 +0000 (UTC) X-FDA: 82883125752.13.3B5D56F Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf29.hostedemail.com (Postfix) with ESMTP id 8D8FC120008 for ; Wed, 11 Dec 2024 15:54:22 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf29.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733932478; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=vH9XUErC1PwatnDvtKg+txYC2pPNsR4klymeTJ2PRoU=; b=rF6EDoPCSJtHNNMOALSeE5gme0FjGzlcUwghbyt+KI0gRTBGl0lnvbslEkAxqyYdl21NXV JyYUHpEBPnxOHY5PjNgBpcAnaQGUR+08mKIZsnrZk1BZiElUEOlnd978vU24hCUJwl2Whd 03YLFDYcJhwnZBlEpEJ3OxrOCOpidlk= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf29.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733932478; a=rsa-sha256; cv=none; b=p3laeZW0NCZUm3ltlKxqnMpyUoYmLPBfpsVJgYxo0lVdqjXUXXxJiPkvXRVSm9KAza+54t 0BActfagvgNvHeH5bmEtkM2SOh3+50px+cAR0m4TJCN7UzHm4YDcyZ4G7yc8mbdKvWFYtx 0sN9v710VWwu/YCUX3Z8WjKcnn3JfkM= Received: from [2601:18c:9101:a8b6:82e7:cf5d:dfd9:50ef] (helo=fangorn) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tLP1t-0000000007A-3AYU; Wed, 11 Dec 2024 10:53:37 -0500 Date: Wed, 11 Dec 2024 10:53:36 -0500 From: Rik van Riel To: Johannes Weiner Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Nhat Pham , Yosry Ahmed Subject: [PATCH] memcg: allow exiting tasks to write back data to swap Message-ID: <20241211105336.380cb545@fangorn> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-redhat-linux-gnu) MIME-Version: 1.0 X-Rspamd-Queue-Id: 8D8FC120008 X-Rspamd-Server: rspam12 X-Stat-Signature: p4htydtnm3x3ar1hgi453jhobhji4864 X-Rspam-User: X-HE-Tag: 1733932462-720794 X-HE-Meta: U2FsdGVkX18PVZ/hXyqwT3luHW5Mrtf4U9sc9vCoMoGRm5kJhm4f8rfmXxWBvtfm9Y2eyr3WQZI/4ga0fzX5aCV5OBLl0rkMQzXGVbyuW63JT/IbQPNgwpqtkkpIYSwCabyzSblIjQeSwLr0XrScR7gzBjAZJxQifWoFslXvG/HIO6nTOHAPk0toupSPjT+yoEs/FXJOL9xciC3nwJNT4xYuDZlMLdT7/kV2IbgZc8ZsIR0PgXtLpq0/i8sjO96m+E+o5fZt/A2izy/eP15pkyzp8yabuP2E6XLRiMbxd1V0tgbysbVzudM1+lWLL/6jrioIl3aycgqJVFxM9i0MrIfv/r+bgpNWIeCNsdI66qK8OwcTzKvDTbcwL4qAnb9qWl9UBvULdWuVX+VFX8ruugjvhaKoqgBsU6OmgN8DKP9mXOCNdyC5C5nmFMdTOi/QVABSkz3oF49nWEKe/X05x4DlyNkstRH+TkXNW34sLRN4Z9txSXQSsNp6oRi15qP+IfD/YsIAgKqUt3Ua7DXTK+jShpZZMU4ZotNcnTMD4vUGPs3NizYU4+bzCgPmayzF+VKSfAbWG8GaCIyAPug48sB0xOXEhXDH5329czXZYACv2adWXEQ382vnN2hW9gDEzsllWnrCVWYu+xgjWQJyx+DmdHDa4n9P9ZC9KaYQebw3md5mOcdc7I1CYOnoUBU3I5ALcvAe5IS31MIiJoo5S4bCtip5/phHh+WOz/1cz1dNqJhZjhStGly1StY7csXL/lbPmcISr7PqXJPRYudNp/dcWPguQCVXAOWrtcYFMnJMpopexnnTmYn9/fJUhvUA+Fq5YqlNF3iohITGa2pasQgaAqfpZwUVVl8hxM2DvSWirW+1mX0j2aBLPXU1cNSwrGQt9pWhXMjbc1T2/JL+zpsepC8QPaGutrMnDmjSH3ya0br5gEJoV882ui0nlKtpa++d31ShDTJ8uk4sfJe 3mWpuFV8 E9r7UvPdahkqdzk19kbXMyQlYjbbH59cJAPYR/UzkDZj6Z1KFdH5Pd0Npn9TTVJEiiHAG6jruZt+bWkkcjIszrjnBjywJnTuAZSALGIAKvv0jykBQzO4+SRsxtMsruSOCtqxUC/UxSEadGzzk3o9PI2dFpFKVkxkbsOYKx/pFxQXkM/TJChBaVyFogqdHo/rk2PNOuaPlhAoC//0l/V/aRa2K9FJJ8yyzNjsTkvGzkU8vo9GzotPlDI/5jKNPrTmsaHg8gCvEDtRiQ4QzHerJFr/8DyRtx6FuKHJmvNOTZLOt/dRCQ9cxoCQjQ2gGDlNEv9nkRb59/L57EVLFTqZS5gVVig== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A task already in exit can get stuck trying to allocate pages, if its cgroup is at the memory.max limit, the cgroup is using zswap, but zswap writeback is enabled, and the remaining memory in the cgroup is not compressible. This seems like an unlikely confluence of events, but it can happen quite easily if a cgroup is OOM killed due to exceeding its memory.max limit, and all the tasks in the cgroup are trying to exit simultaneously. When this happens, it can sometimes take hours for tasks to exit, as they are all trying to squeeze things into zswap to bring the group's memory consumption below memory.max. Allowing these exiting programs to push some memory from their own cgroup into swap allows them to quickly bring the cgroup's memory consumption below memory.max, and exit in seconds rather than hours. Loading this fix as a live patch on a system where a workload got stuck exiting allowed the workload to exit within a fraction of a second. Signed-off-by: Rik van Riel --- mm/memcontrol.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7b3503d12aaf..03d77e93087e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5371,6 +5371,15 @@ bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) if (!zswap_is_enabled()) return true; + /* + * Always allow exiting tasks to push data to swap. A process in + * the middle of exit cannot get OOM killed, but may need to push + * uncompressible data to swap in order to get the cgroup memory + * use below the limit, and make progress with the exit. + */ + if ((current->flags & PF_EXITING) && memcg == mem_cgroup_from_task(current)) + return true; + for (; memcg; memcg = parent_mem_cgroup(memcg)) if (!READ_ONCE(memcg->zswap_writeback)) return false;