From patchwork Thu Jul 20 07:08:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13319864 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC316EB64DA for ; Thu, 20 Jul 2023 07:08:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 326A72800BB; Thu, 20 Jul 2023 03:08:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B02F28004C; Thu, 20 Jul 2023 03:08:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12C1E2800BB; Thu, 20 Jul 2023 03:08:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EE9F928004C for ; Thu, 20 Jul 2023 03:08:30 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C043D1A01A8 for ; Thu, 20 Jul 2023 07:08:30 +0000 (UTC) X-FDA: 81031111980.26.1463031 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf03.hostedemail.com (Postfix) with ESMTP id 250E12001B for ; Thu, 20 Jul 2023 07:08:28 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=NmItOuI1; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of 3a924ZAoKCDYqgkjqSZeWVYggYdW.Ugedafmp-eecnSUc.gjY@flex--yosryahmed.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3a924ZAoKCDYqgkjqSZeWVYggYdW.Ugedafmp-eecnSUc.gjY@flex--yosryahmed.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689836909; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=U/LCb/swRVuVzVICP60Gq89mZaxuIEA6maW/j6Ih8uc=; b=bqDDWvJjVAzzAZ8VgfHyaDidCSMwsYgqrw257NLhGf7QG4vnW4McoeFAK/MLGQsinYQR0j 8GrTJSwamdYO9GJA22Swi91Ukqwd3w+y2mpIhGbmi623EKDWUSMzTmEp3v1gzAakCVpEyP xXKj+TMR+gYFAuA2D42Jc8rGCbfc8JE= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=NmItOuI1; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of 3a924ZAoKCDYqgkjqSZeWVYggYdW.Ugedafmp-eecnSUc.gjY@flex--yosryahmed.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3a924ZAoKCDYqgkjqSZeWVYggYdW.Ugedafmp-eecnSUc.gjY@flex--yosryahmed.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689836909; a=rsa-sha256; cv=none; b=MCTacncNef7rjKg6W9lGGb90mZP7a02Ct52bfwnlX8k4+L6rL4/cgCWzfZvXAlZRIU1l9Q jcKqaEXlUUvlMJBR2KgBvOb9GJYlInrlrh8K9bbvT7l479h/LW/uzGqch7UA3IdK9Qz3GU cwVsJ1HgvAdap9dcvfZfgx87F8k5QhA= Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-55c79a5564cso221589a12.3 for ; Thu, 20 Jul 2023 00:08:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689836908; x=1692428908; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=U/LCb/swRVuVzVICP60Gq89mZaxuIEA6maW/j6Ih8uc=; b=NmItOuI1BO8FpivokQJywTfJdRc87myk9anTbJ4UoJVsByhpyWZ2NhGs3XaN2cP2Nt JOV9ihqCzY8Rxsp+r6isfIFkW/SUwj9Vss4W2DuK2ccFsP635edEhy1jQxM9GiR5RwCK 5oMMI3/PuSouXpEAZw7u4v68TUmwdBfHnyKCeBIftZE10kZdkxmv5uViyp1Mkr88BgDy w0adw8Rgvu0dZI9YOQ0lxUV8rJvfzPt+gm6BycHBJUwzSYgtNUNUoYFoNjXznI+XvsO6 3RMjTaAvwUCkl3RUI9BRhsU/hvOSutDDKijJFjy210IHQPmnpkspt3yoNydMInG9aMxS +bjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689836908; x=1692428908; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=U/LCb/swRVuVzVICP60Gq89mZaxuIEA6maW/j6Ih8uc=; b=TqwnFlULf9P8OagvnMntW1zwzVD/sr0COOeFBRsAVIIL893jzD6EeNGrj2mQHbGk6u /GY1nzb86Gp+p37/Ox+5cj6UvlUbMxvJI2Oj79wHxbwt6kvfcZSaVQmJk69yqBhZfpv3 /Q25UQOwVUlHoqPsGDpWmBqSa4swkyCm6ex/oNI+zptYCsa7Ek+NYLaCCNEPJiv3Emr0 KUkEUIO2lFnExvmIde3NWdlw/VYgKdHIJQW+InIZlrUiJSkIDTPjRHU44hv0shfZ7us7 Yg+/oFL0CzJIvhHoiD4HMMRjh2PdirDdic1cGN1E03eML1OTBNOEAhclCLwccQJz/MNr VIMg== X-Gm-Message-State: ABy/qLbruN5vpi/5wDEfyUJcN0v7K7r6PD4OjGXPDnemBfysLWNn9iSl UTGz6tU0BlfmRkZQikNn3QNe1Tbn++m1mOia X-Google-Smtp-Source: APBJJlGoahb1Hw3DrqI5/DdSPxhCfuqs9wKCySncxj0p/hiZ24g7ZOrRHDY0we6awSzCS5RXF+EZYXNCxdbOuEoT X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a63:6d4f:0:b0:530:3a44:1581 with SMTP id i76-20020a636d4f000000b005303a441581mr20974pgc.9.1689836907839; Thu, 20 Jul 2023 00:08:27 -0700 (PDT) Date: Thu, 20 Jul 2023 07:08:17 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230720070825.992023-1-yosryahmed@google.com> Subject: [RFC PATCH 0/8] memory recharging for offline memcgs From: Yosry Ahmed To: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt Cc: Muchun Song , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yosry Ahmed X-Rspamd-Queue-Id: 250E12001B X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 8uakq8o958fh198i3d3noejdra9r9uhz X-HE-Tag: 1689836908-200324 X-HE-Meta: U2FsdGVkX1/TmXjf6A0vFNDPE/dHUpDgZ7MsxFYJSxIiV8icrJB5gCIVuZRKgCFxzfSBFEPB4T13zDALrjzdhqop+dRfeLGzQJFn2HfUd8jEce9gLyLNMBmhduwlCG+WNa+rC227sQO5AgqgY5KIUj8PYjoMYsQrWemir/ThLI29zf79ghgeNVONttX8CRRH8V9rfgUnBL+hbD5etv9+AyXdGyMjivq4xFl/9sOg2nQ666Dj+HZhTaiHacXuPnuvqd2/FwP7eF1JTykpYJtPCqUeMr6TkCGBtb955dzSASbjbc9MwH2xAtqheOgblso6CFP+uNswrHHAHRmBjhoI+8T5RFiHVNZFdXkrPbXvxk137SY3FDxm438xA+sTLfbLjpTqTXiINu9mTJ3mhlUNQIWmrLACJOANG68XyXrKju5cSi1iMnorb83ZSyhXjFzzAKzZXCs5HCeXqQMfGMN43BZHndWGz7QghYb4DS04ynt7KF2L5o+6EAWNdZrOPTP/N4OoO2vlRDJNgmHC8JtgecrACSUOfkbquEOuPsCjcK+6wXoTAICA44iiGeXbKX79ZLsEmoc1N95S24AuvROBj7e0kXz6yZ66MI9nAcpZ20afgTC54ocdwWp7QSNwsLcq+dR/4RHJmSlG4f3BVGQQvNq4kLiU5eRI2cPv2vDz5aNwVPFtob+u0sqcEh6Cw+YjnD+U4AiIpeoH9CHcaHbEa5u0z9nGFdLq0Cyykt4lKQ6UOodaCShU1K1hmrq3zNadidSa2mr4oI0T6nI2GXsyEJGA6uuEyuBf3MH3To/gk+2/wrlq4HjrrcrZ59wCqySCuntJEcOXcIjLpieKKwaMcXWgVa2c1hMi+nsaaT44gCG7ZoNzhMPxDVlBmp+fZ4EG1iXaXi679oMF+iXFJWkWI1NiQyysz0PZLUEvba7X6p9+gXAfQ7X640dJnQzODre6+0ZsBHVpg0oGkUnL9on dppOWf4T TkV1Xa2s0RQMmzesTihGCBt5DHPNsxgakdzHu9tK1/rubHAGG6yFV5rNj2yM0N8HwjG9b/pH64E1qlMPcIIpDHToXZhqoxu5yYRECbqj6uXRuRH2vR/AJ+DYVhZtMcSYPz4n/i1PS+4fI+Z55PDb72ZzcqOzQJkk7Vbc/cNeLPdGlWt6YydybLx4nZxN7Mpl39U+HScaEgXP6MnfD9Cm0pXLiyZc0dChmGS3tExOum22HZJvdDc19hFw0XIuPs70nhfj9WeNA7cGKeJXuCce135CZgrDFTGyQxz+2u+e5ZrDAjAnUgceMJmZHsOo4lOS/94oHNvZMacCxvPQQCwX91fTMVmxhtWAF93SFzq4sUnBuV9aaEiDp4jCl86o/cHI+DdpWZ/nDrgkSOeL+M0J3qubNFVkG2bVIhYsbwSa0SBzA5lqQbCHqPx716GCgHD93X5GE9fpc8jHl2mV/rwRN3euUYU4WOsVK9dzlvXVn+HEt8xFORxrZ9Kpjcmeq5TvrC+1FO4RrcQZQacFFBJ/TSYinLcxCzckJm3987+Y2UkggFEs3fYSkZeoo0I2MK+qtpJRgIJjEuvCG8+odnEWwmXt4/WsfjAciCP3Ww5qJvyYov97ExDpD7nyj32DGxXQZyvNvBvAGvWD06Yrip8kFWKcMPQAM/gP8tGFkIcizHoyubrFDow7iDbBfTAnt30+5/3UR4Umic8z9P8gYMfq6I0rizISOGTI8FaE9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.001221, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch series implements the proposal in LSF/MM/BPF 2023 conference for reducing offline/zombie memcgs by memory recharging [1]. The main difference is that this series focuses on recharging and does not include eviction of any memory charged to offline memcgs. Two methods of recharging are proposed: (a) Recharging of mapped folios. When a memcg is offlined, queue an asynchronous worker that will walk the lruvec of the offline memcg and try to recharge any mapped folios to the memcg of one of the processes mapping the folio. The main assumption is that a process mapping the folio is the "rightful" owner of the memory. Currently, this is only supported for evictable folios, as the unevictable lru is imaginary and we cannot iterate the folios on it. A separate proposal [2] was made to revive the unevictable lru, which would allow recharging of unevictable folios. (b) Deferred recharging of folios. For folios that are unmapped, or mapped but we fail to recharge them with (a), we rely on deferred recharging. Simply put, any time a folio is accessed or dirtied by a userspace process, and that folio is charged to an offline memcg, we will try to recharge it to the memcg of the process accessing the folio. Again, we assume this process should be the "rightful" owner of the memory. This is also done asynchronously to avoid slowing down the data access path. In both cases, we never OOM kill the recharging target if it goes above limit. This is to avoid OOM killing a process an arbitrary amount of time after it started using memory. This is a conservative policy that can be revisited later. The patches in this series are divided as follows: - Patches 1 & 2 are preliminary refactoring and helpers introducion. - Patches 3 to 5 implement (a) and (b) above. - Patches 6 & 7 add stats, a sysctl, and a config option. - Patch 8 is a selftest. [1]https://lore.kernel.org/linux-mm/CABdmKX2M6koq4Q0Cmp_-=wbP0Qa190HdEGGaHfxNS05gAkUtPA@mail.gmail.com/ [2]https://lore.kernel.org/lkml/20230618065719.1363271-1-yosryahmed@google.com/ Yosry Ahmed (8): memcg: refactor updating memcg->moving_account mm: vmscan: add lruvec_for_each_list() helper memcg: recharge mapped folios when a memcg is offlined memcg: support deferred memcg recharging memcg: recharge folios when accessed or dirtied memcg: add stats for offline memcgs recharging memcg: add sysctl and config option to control memory recharging selftests: cgroup: test_memcontrol: add a selftest for memcg recharging include/linux/memcontrol.h | 14 + include/linux/swap.h | 8 + include/linux/vm_event_item.h | 5 + kernel/sysctl.c | 11 + mm/Kconfig | 12 + mm/memcontrol.c | 376 +++++++++++++++++- mm/page-writeback.c | 2 + mm/swap.c | 2 + mm/vmscan.c | 28 ++ mm/vmstat.c | 6 +- tools/testing/selftests/cgroup/cgroup_util.c | 14 + tools/testing/selftests/cgroup/cgroup_util.h | 1 + .../selftests/cgroup/test_memcontrol.c | 310 +++++++++++++++ 13 files changed, 784 insertions(+), 5 deletions(-)