From patchwork Thu Feb 2 15:56:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 13126438 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08B7CC61DA4 for ; Thu, 2 Feb 2023 15:56:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5830C6B0071; Thu, 2 Feb 2023 10:56:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 532466B0073; Thu, 2 Feb 2023 10:56:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F9426B0074; Thu, 2 Feb 2023 10:56:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 315946B0071 for ; Thu, 2 Feb 2023 10:56:31 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 046F41A0EB7 for ; Thu, 2 Feb 2023 15:56:30 +0000 (UTC) X-FDA: 80422804182.02.7596FA2 Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) by imf18.hostedemail.com (Postfix) with ESMTP id 442FD1C001E for ; Thu, 2 Feb 2023 15:56:28 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b="0/2QcsVn"; spf=pass (imf18.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.48 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675353388; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=IcK2j7GaecPZ1Vx44z6AigZg/DHMEIHlEpQ2CSXmGNM=; b=cFBwMzRWOUfboykD7iPvqBrNdd86noIxsKi5Av4bnD/N8u94PWnjItHGqjocEhUeArW5Kw TjI5rw5OZI02cYUAmB32Cg4kUs1OOzZPsdHp7L7N5HbOf9PWVWEJFQh2YCpS6JmwyOIUfS Z9V5y03CtFbrFALiidn01IbaFd1d2h8= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b="0/2QcsVn"; spf=pass (imf18.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.48 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675353388; a=rsa-sha256; cv=none; b=DNHQCNUBuWd+edku3RCPKUcbtTm5TVeA4VAJsJe6534lkv3hVzqNN/yGnd2R9H62VhXCQ2 vnNoOLxDN5A399GatnGALWGsbE95gdRMZDKhGZUUAR0bU3F4kda3bYzAlxtN0NmiI9gSj2 Iuv3vUKJxgUO8c4CyWhjXer7XFAMYZ4= Received: by mail-qv1-f48.google.com with SMTP id k2so1216109qvd.12 for ; Thu, 02 Feb 2023 07:56:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=IcK2j7GaecPZ1Vx44z6AigZg/DHMEIHlEpQ2CSXmGNM=; b=0/2QcsVnBHC20DmS2GdpIB11iAxP9z1WAI7Y8vitVSx3DLYzJM2Yap61ooPAdTvQ4q 6zU45QD9XPGJ9qh9EXvxA0vwy6ap0T+oeZtY4OHAkqmn7NF5bF+ci2FkIN57tcMuT8Cj qdmgblY3QKvKhbbDgPQhgpoIWKPDxp3A1mA/1WXAhGY+Jbzhq0TNnLoK1kYzgyfE6gvs B6ZDalOl0eVh2kgFGsonC2DbI9vwH4laWAp2OkcqNZbB5BKsiJ/uHHgxcuuyL/CDAKRo U/Iw2hQ+ONWFW22h4t20EWnb/CLcg4oCnPZyaQC//Z12zJBJkX4sd9HWdXQnJM7+Ldjv Vr/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=IcK2j7GaecPZ1Vx44z6AigZg/DHMEIHlEpQ2CSXmGNM=; b=vdVl9zeoZOiuwjrdpS/9sv4BSBFltqR4RT3N//ccfrwfkKEJyAe7H/CTuaGVZSO+Nl 0cQCipRSXvhZqbxoIAwWEKO8Cs3mXORLTZlDPXaHe/xTSsiKqvZka3TOigzUfZxDGsDA GGuXC6sTe55l0SDFHmu/sDntgS8MTmKrth/woVnSBgABFmM0KKYmdwUfjMPVVPeryBZQ By6E/uMqRdoivtx+zU/uK9xe0AvVyZAm7lW87cnMvng0SbzZ7pCniWpATEI3YLKChO9h OgUpaRK40uo3VDB3rEkXd8SHfro/hFQFE1uHv4kTg2IPETJFOKSdbJKK5px4HI44leSk 4V1A== X-Gm-Message-State: AO0yUKXvA7W7jrz4jWAf1AOswKRH1lEzgLOdb9AGDeqQsw5F6MM78OX7 Qs5P8SXYGlJtUm9W1siHaLMzZw== X-Google-Smtp-Source: AK7set/JFU+snISJEv3RhRVGviz46AiF1b1OLNWcNoUU+KaFBG5vpNUKkelttsvbOGif1nFM9oGDow== X-Received: by 2002:a05:6214:4202:b0:542:adcb:60d7 with SMTP id nd2-20020a056214420200b00542adcb60d7mr8723122qvb.39.1675353387251; Thu, 02 Feb 2023 07:56:27 -0800 (PST) Received: from localhost ([2620:10d:c091:480::1:c833]) by smtp.gmail.com with ESMTPSA id w14-20020a05620a444e00b0071f0054b094sm8692489qkp.43.2023.02.02.07.56.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Feb 2023 07:56:26 -0800 (PST) From: Johannes Weiner To: Michal Hocko , Shakeel Butt , Roman Gushchin , Tejun Heo Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Christian Brauner Subject: [RFC PATCH] mm: memcontrol: don't account swap failures not due to cgroup limits Date: Thu, 2 Feb 2023 10:56:26 -0500 Message-Id: <20230202155626.1829121-1-hannes@cmpxchg.org> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 442FD1C001E X-Stat-Signature: dfjpako863i8qh6gfq7omu753zzmiwbe X-HE-Tag: 1675353388-835694 X-HE-Meta: U2FsdGVkX1+IYfMvCQTBXlxLAzi3VzzZrC3MQJuyRc22XIKtkQpSsyJ3zQxCMnPm1NArxeQqFIO0G/CGhmQIPYD/KVtUTmYLU5g8Zn+KH5toaP6l8g5LD2/ttFe7p2mjudyUhajwbGx1/MAvfQ+Ra6iiPbYhQEPm6qRcjtQwT+SZkYIHBsqg9HoRLRnXb9UonR6Tk9b4hhs6AdYX2JrOHUddBs20dqR/LvgMhRt6pkvIdNRyROaJiCkAX0RV8LXMiBK5Vbg3uNfaO5A2BIQazdNwDSew6Pi55jiWZtqE+JTCbY/1d2+nQIheeH9W5tlIwej7P0BKL4ZKI2c7wKZeeHCxls5ad8DyNqZsqlSjMDZ2YGhnU0BwJDvMfPIOOSmVYfkWPCkcnPWCkhD+ydHPsjdWNrdwsJ5yOk/oxDOjXVrhFpZHLoUGNRzJrg7iU4KWLSBeE4rVG2VV10+RFVzero55LVxAe+QMYAjLQl+oTNh7RlWHuQyemtHSfrvTIuuV+GCrxXChGsh5pcenAnVh/6FIlhfdD6i0qgFhKohMH4B7yKkIfS2pDYsvCNufIaU4QqOVGBnpZkUdjAXD2WAINgLPS3r8VUwEVTol5y7FpHpUCLgvQkxPsaGWgpzNMJJeVQvg1UlamaKbxH/yZ0wXW841+i71hiU5sN3Np+2vSxp3wnVWmmYEtFNTIsAjJWuIXksYK1ljuRfDtEbMEAtJ2apxL92O9qXemqJg2DP4gifNjGRsKr9F4o8C3PPQtCk7D0nEpLivDKmb2tgniJF8AEiZxUGn5V5xk1mVvUZV4WJEfUffZFaObMGx0HbtYw5tD8VosLQnB1gTK/Iq7C28sohHmugHDM9PkfTQs6eoxEHiJN8mHor5ZnuOEJj76rbQklYkE9HFmwS6p+Qt2Op/O2h59+7ENr5PMbaOAd7iDOK2VShlv/wxTBRYFBnjU9l9N9fodXyRrLlz3y9krle pDq5xApi q91jTdKjXEEugttJarjFLc6WMAFTZX28aCNlndxtZw0gTTcoEr9UuRwa4b3Qm9h96ucd1dq9QI4CZm+ThtzcYJPYvbUhy6btSzb1v9nnRzGz7xc8tFEc4imj8wvoNr219tKoFu9XuuDFqKm5U+Fj/vZTzZHgCarcH6+EhyFJEuuCsyc8DttX9HO1ohuF/VIsGMIrU0sZCr/xUvOXaN8CQ0qNBhXz24aXsaI2cTr5DfK8xJx4xC6q1Z1cQBCCmWkYwLttPTP4oyLXaH9DP16Xn10RQB2C1jYI7ZxV72fed1ina/KQ67+6l5HtMEeaCrTvmXSPJCvVzm4kUG9r0+FSBryqslIEYshEtU234dGFi8HQ4VHAZhBaLhp/7qbd5OXNR2npL8lhfr8nU9u12VQ0fhrkjc+ay8OwGpAshULLArrsYjYYuVZ41cABYBzu3Oc4oe0Qtr5660ajYIhZQkKeWqlq1huk16Jvaq0VdlWnjUZsj0MOZ6OTVXFfuqvWKVLBRKuTlThKi+KVtmIs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Christian reports the following situation in a cgroup that doesn't have memory.swap.max configured: $ cat memory.swap.events high 0 max 0 fail 6218 Upon closer examination, this is an ARM64 machine that doesn't support swapping out THPs. In that case, the first get_swap_page() fails, and the kernel falls back to splitting the THP and swapping the 4k constituents one by one. /proc/vmstat confirms this with a high rate of thp_swpout_fallback events. While the behavior can ultimately be explained, it's unexpected and confusing. I see three choices how to address this: a) Specifically exlude THP fallbacks from being counted, as the failure is transient and the memory is ultimately swapped. Arguably, though, the user would like to know if their cgroup's swap limit is causing high rates of THP splitting during swapout. b) Only count cgroup swap events when they are actually due to a cgroup's own limit. Exclude failures that are due to physical swap shortage or other system-level conditions (like !THP_SWAP). Also count them at the level where the limit is configured, which may be above the local cgroup that holds the page-to-be-swapped. This is in line with how memory.swap.high, memory.high and memory.max events are counted. However, it's a change in documented behavior. c) Leave it as is. The documentation says system-level events are counted, so stick to that. This is the conservative option, but isn't very user friendly. Cgroup events are usually due to a local control choice made by the user. Mixing in events that are beyond the user's control makes it difficult to id root causes and configure the system properly. Implement option b). Reported-by: Christian Brauner Signed-off-by: Johannes Weiner Acked-by: Shakeel Butt --- Documentation/admin-guide/cgroup-v2.rst | 6 +++--- mm/memcontrol.c | 12 +++++------- mm/swap_slots.c | 2 +- 3 files changed, 9 insertions(+), 11 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index c8ae7c897f14..a8ffb89a4169 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1605,9 +1605,9 @@ PAGE_SIZE multiple when read back. failed. fail - The number of times swap allocation failed either - because of running out of swap system-wide or max - limit. + + The number of times swap allocation failed because of + the max limit. When reduced under the current usage, the existing swap entries are reclaimed gradually and the swap usage may stay diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ab457f0394ab..c2a6206ce84b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7470,17 +7470,15 @@ int __mem_cgroup_try_charge_swap(struct folio *folio, swp_entry_t entry) if (!memcg) return 0; - if (!entry.val) { - memcg_memory_event(memcg, MEMCG_SWAP_FAIL); - return 0; - } - memcg = mem_cgroup_id_get_online(memcg); if (!mem_cgroup_is_root(memcg) && !page_counter_try_charge(&memcg->swap, nr_pages, &counter)) { - memcg_memory_event(memcg, MEMCG_SWAP_MAX); - memcg_memory_event(memcg, MEMCG_SWAP_FAIL); + struct mem_cgroup *swap_over_limit; + + swap_over_limit = mem_cgroup_from_counter(counter, swap); + memcg_memory_event(swap_over_limit, MEMCG_SWAP_MAX); + memcg_memory_event(swap_over_limit, MEMCG_SWAP_FAIL); mem_cgroup_id_put(memcg); return -ENOMEM; } diff --git a/mm/swap_slots.c b/mm/swap_slots.c index 0bec1f705f8e..66076bd60e2b 100644 --- a/mm/swap_slots.c +++ b/mm/swap_slots.c @@ -342,7 +342,7 @@ swp_entry_t folio_alloc_swap(struct folio *folio) get_swap_pages(1, &entry, 1); out: - if (mem_cgroup_try_charge_swap(folio, entry)) { + if (entry.val && mem_cgroup_try_charge_swap(folio, entry) < 0) { put_swap_folio(folio, entry); entry.val = 0; }