From patchwork Thu Jul 20 07:08:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13319868 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA395EB64DC for ; Thu, 20 Jul 2023 07:08:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80B232800BF; Thu, 20 Jul 2023 03:08:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 792CE28004C; Thu, 20 Jul 2023 03:08:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E4DA2800BF; Thu, 20 Jul 2023 03:08:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 4046028004C for ; Thu, 20 Jul 2023 03:08:38 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0A021A018F for ; Thu, 20 Jul 2023 07:08:38 +0000 (UTC) X-FDA: 81031112316.01.7C413C5 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf03.hostedemail.com (Postfix) with ESMTP id 3D60E20016 for ; Thu, 20 Jul 2023 07:08:36 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=buokeDrF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of 3c924ZAoKCD43txw3fmrjilttlqj.htrqnsz2-rrp0fhp.twl@flex--yosryahmed.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3c924ZAoKCD43txw3fmrjilttlqj.htrqnsz2-rrp0fhp.twl@flex--yosryahmed.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689836916; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pwRzTa1HWBGDz2y6kKmJLG3qmZZBS7SfdYMDl6S6pYU=; b=ZokrgKPjM44spE/2tXHz2NUGsVqFTglxZlspmaIhJsMG5CxUvXhdxF6QUOrrfMX1nCaZ6G Z3gdmsrmy3ngG5b/5NWoMcFFs9Uj0T3TvcWk14OKw9lXgv2K6ks+3AIP+Ka+PoEDVeJOeM Ycyj9kHjD+oVo+0PdSjEZ9evgcgW/us= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=buokeDrF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of 3c924ZAoKCD43txw3fmrjilttlqj.htrqnsz2-rrp0fhp.twl@flex--yosryahmed.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3c924ZAoKCD43txw3fmrjilttlqj.htrqnsz2-rrp0fhp.twl@flex--yosryahmed.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689836916; a=rsa-sha256; cv=none; b=uRoDMREdItNgA9fJK5MwC7ISew9tb6ERilMYW+z5Y17jpToeVeGnMW5zUxi65oZtW22hli R7EjlrMaBiYzSRL9Ax8k33W7KGJQRYZjnc32xBDqV8vKb8Pm8nuJVQ+8u8Gll8xB8KA1ra hfoMoatKVxwxqFSkD+WORMOWq6Ph8jM= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-5618857518dso5324307b3.2 for ; Thu, 20 Jul 2023 00:08:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689836915; x=1692428915; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=pwRzTa1HWBGDz2y6kKmJLG3qmZZBS7SfdYMDl6S6pYU=; b=buokeDrFKzaUQNx7mhOgxc7yhdstiZpTj86UXwGCh3nTqt6qszmfeuRufu8Lj7sJoB FZ8d+MffXHna2mt12Y1ia8+ZNaiYEL4KKqiWl5QyksN4EKDtA5MmamlaJzubK1L4iKoj JnIOyZVRWPOZEDrLS+Z7GpFAMpF6YowvDF+xXDYfmorPwNBAYyVmDpwyqMhca813Xnoe RKGYMhZTq7+fZsmgRNH21NnT88RSM7deT4LXZ9ISfVkZGDcZnSBTWZjY2y9bjfmsBNsN UTtLHPVn8pgWNVg/M6uRpXBrkyjKtRhGZuuMQmP3rtFreh0ZALGBo7Ltyu1x30E1hL2z 2vZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689836915; x=1692428915; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pwRzTa1HWBGDz2y6kKmJLG3qmZZBS7SfdYMDl6S6pYU=; b=KKTbDwnFRswcL5n14bc733/YCrQUY3DWSq1LuAvA8xya3gHWmttBouJocFzc1aeVoz HEcDbKvt1V3gJRHxuDfL+iyNmzPzE/RbTvqSjHvcz1TpzqVSiEr+xhL2T90uAXpNODf2 fLC3QsmW7XfDMERswj/p7jhgouGtRn/afJNUSz/pBbAn2Lb+K1xqIEJ6TvihPLilrV+t If5Pm24Ns+JcGvEXMlxtJIsjtP8lj7XS6bmbeR+Lmmf+DQHU4lIkCnsSd40vjveHWNOB VhltqgcATvx/SQT+ckrjUhZvwN+y81U7xcHJJZhGyjFZ+hxiMrAvVDfE/jzRHH22wjvx YPng== X-Gm-Message-State: ABy/qLYUW2qwB04iCPiojYGF92HJEaj4NQyk/Bwpt5FfxxZ1u0+xE2vr 78dEX0ufbtPMI3/HsmVyVqvN+iPuMvKBnHRE X-Google-Smtp-Source: APBJJlE07xGsF+T8VwpNf+vzMb7xCWgMW2wkAUXSoba/cYRX8PQTLWxD8xY/UgGAeecgR6Yrkm/cfEiUn0RQpzi8 X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a81:8d51:0:b0:579:fa4c:1f22 with SMTP id w17-20020a818d51000000b00579fa4c1f22mr224500ywj.6.1689836915238; Thu, 20 Jul 2023 00:08:35 -0700 (PDT) Date: Thu, 20 Jul 2023 07:08:21 +0000 In-Reply-To: <20230720070825.992023-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230720070825.992023-1-yosryahmed@google.com> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230720070825.992023-5-yosryahmed@google.com> Subject: [RFC PATCH 4/8] memcg: support deferred memcg recharging From: Yosry Ahmed To: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt Cc: Muchun Song , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yosry Ahmed X-Rspamd-Queue-Id: 3D60E20016 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: ripd55wnfbiwxhejk7nubcecyzocspmn X-HE-Tag: 1689836916-799144 X-HE-Meta: U2FsdGVkX18SIDQ3w9MfsX/PUc9Qdu0kXCpFH3u2bJsaT8S9jvSikmNK7uOZHecL9HkIkfyVILUYcdOpHl2EW0/ihSLurWx/cRxFk3jz6ICvuX/6nYi5kKGxwCAKg8ZTfiqBH2LvB4BUxaxabAi90Ik/sWXWMrufNIunaf5hcL/KqrF8h9O61eMBwn748rJLs4awBuvVRbWRwZNFpssupsS6aYQTNXaIcP9umHe43THYePXFuXmdZmmZutdymfn6COHIvtNoZjpJZZMTdVMsMayUW9KJnhUWIaLVDjprAX56Y98Szhl8zi+aIczELnSoJNT1qYvWtHyG3FanOtUyVzHpxMYwjPkXdHxEMpGCXSHtV5RJopgu8e85Fb+7MKU3+V5/SRI2fjF0JE1MSdWzwBk0/cO2ycWpnfsfWW7MHPuS6JDAlQVZuNJQGyzwy4o9c/dM+n95lvIyM5toWynDAg1MsbVErMwgcNVRqHBOHCLOKSCCcura4o65JbmM81MJDoHeE7Uu3alFF2/nnWB3Fy8+PcDc1XhGenfjYWIRxlaMqttphFQGncNevty+m5JRpJ9pnHyqOlkkQSOVQjLFFteAXYAUb+2y2ID3vBFg3VS7zUc6tnR9LrKnsnov0uiOtFtfav8ioUZbH1V+t64s5i4vb6v7udEzDXsKa4Wp7j1S7ZAYBfu0A+craBfTjZ7mRaGM5wLWYvhO+I760Z+j05lz8ZOt/xl7v13nZwgRdilYjE/rdRacSbh8z6w3NTtZYO+ja+tRIQg0K+Dsl5MLq1M2ZNuw9ybl7pkgR866YJ3GLsEXVSC3BdX+FFZPpI6bMZLBfVtd0GpE7Iy6Ap+/efhVLs2CY9Aem37nxjRvaeKLsSAgtc+Pdwavw7z/vtzlMWR1e+kc6ivtB0DqWez+Q7JFwh8BqN8an52V+D4G1Mga2ucWAei+t5DhIXgff+xrMdAM++NcUs3faI5JRvV FPgeKfQl r4wgJpYMdt68Fzfq4EGn92YJWrltpLnEvTnwlATwkGMLwBIpu7x9tvqbZuMkd/eMl75ctJ0JR8/mpDZRY2ETOj0e6g8nTu93wyWtxplBOIDNpHZtmrcJU9hgRN7GW0+LZoSzVQAYcouSoytGO94oRDD0NKziCC5XzWRQoroz09F4LCsIc4AAADA+p7vkIcDeRt95K360d6ljDMUkKoJxB8at9syHM2vxiB+VL1xYT18cSnBhnIMeC6tnQyKqPx2LSWgEg1z3lxvbfmvhmRJtGsdL6D9RJ9CapRy25uApbN2XN4+Yltic3FFEYd15xTs5oW9u8fylfqQl3QG1X8kfOUaxMhOKHBccIxNg4vyktsN4N+5ibQnw4Y2ynjoFJmM3TD8V8fbz5uw3mZ8Ykp/kraDCBNHyfW+csxcmAgSGOuIrpB/9Yn3UFjeARM8TfVNqmCx/qMxy6RSnwXs0k2NBe2thYVrGpeS1Qia+B+3XO6v6VMlZeffpqdiAbE5BvGUtywSFBJQTsE37M+b7WdH6WScwWwR3/xxHS1ig46yrLpuSQBDpX7b6366oWPE1iyUyUE7dFDM5jH0720RAJokBYYqqr8r3lW7VQ9uUnlVulk6AGfwBrPkofCuACJljteHlWYVRInnH1qGwF2e8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The previous patch added support for memcg recharging for mapped folios when a memcg goes offline. For unmapped folios, it is not straighforward to find a rightful memcg to recharge the folio to. Hence, add support for deferred recharging. Deferred recharging provides a hook that can be added in data access paths: folio_memcg_deferred_recharge(). folio_memcg_deferred_recharge() will check if the memcg that the folio is charged to is offline. If so, it will queue an asynchronous worker to attempt to recharge the folio to the memcg of the process accessing the folio. An asynchronous worker is used for 2 reasons: (a) Avoid expensive operations on the data access path. (b) Acquring some locks (e.g. folio lock, lruvec lock) is not safe to do from all contexts. Deferring recharging will not cause an OOM kill in the target memcg. If recharging fails for any reason, the worker reschedules itself to retry, unless the folio is freed or the target memcg goes offline. Signed-off-by: Yosry Ahmed --- include/linux/memcontrol.h | 6 ++ mm/memcontrol.c | 125 +++++++++++++++++++++++++++++++++++-- 2 files changed, 126 insertions(+), 5 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index b41d69685ead..59b653d4a76e 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -956,6 +956,8 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg); void folio_memcg_lock(struct folio *folio); void folio_memcg_unlock(struct folio *folio); +void folio_memcg_deferred_recharge(struct folio *folio); + void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val); /* try to stablize folio_memcg() for all the pages in a memcg */ @@ -1461,6 +1463,10 @@ static inline void mem_cgroup_unlock_pages(void) rcu_read_unlock(); } +static inline void folio_memcg_deferred_recharge(struct folio *folio) +{ +} + static inline void mem_cgroup_handle_over_high(void) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a46bc8f000c8..cf9fb51ecfcc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6398,6 +6398,7 @@ static bool mem_cgroup_recharge_folio(struct folio *folio, } struct folio_memcg_rmap_recharge_arg { + struct mem_cgroup *memcg; bool recharged; }; @@ -6415,10 +6416,12 @@ static bool folio_memcg_rmap_recharge_one(struct folio *folio, */ recharge_arg->recharged = false; while (page_vma_mapped_walk(&pvmw)) { - memcg = get_mem_cgroup_from_mm(vma->vm_mm); + memcg = recharge_arg->memcg ?: + get_mem_cgroup_from_mm(vma->vm_mm); if (mem_cgroup_recharge_folio(folio, memcg)) recharge_arg->recharged = true; - mem_cgroup_put(memcg); + if (!recharge_arg->memcg) + mem_cgroup_put(memcg); page_vma_mapped_walk_done(&pvmw); break; } @@ -6428,9 +6431,13 @@ static bool folio_memcg_rmap_recharge_one(struct folio *folio, } /* Returns true if recharging is successful */ -static bool folio_memcg_rmap_recharge(struct folio *folio) +static bool folio_memcg_rmap_recharge(struct folio *folio, + struct mem_cgroup *memcg) { - struct folio_memcg_rmap_recharge_arg arg = { .recharged = false }; + struct folio_memcg_rmap_recharge_arg arg = { + .recharged = false, + .memcg = memcg, + }; struct rmap_walk_control rwc = { .rmap_one = folio_memcg_rmap_recharge_one, .arg = (void *)&arg, @@ -6527,7 +6534,7 @@ static bool memcg_recharge_lruvec_list(struct lruvec *lruvec, continue; } - if (folio_memcg_rmap_recharge(folio)) + if (folio_memcg_rmap_recharge(folio, NULL)) *nr_recharged += folio_nr_pages(folio); folio_unlock(folio); @@ -6587,6 +6594,114 @@ static void memcg_recharge_mapped_folios(struct mem_cgroup *memcg) } } +/* Result is only stable if @folio is locked */ +static bool should_do_deferred_recharge(struct folio *folio) +{ + struct mem_cgroup *memcg; + bool ret; + + rcu_read_lock(); + memcg = folio_memcg_rcu(folio); + ret = memcg && !!(memcg->css.flags & CSS_DYING); + rcu_read_unlock(); + + return ret; +} + +struct deferred_recharge_work { + struct folio *folio; + struct mem_cgroup *memcg; + struct work_struct work; +}; + +static void folio_memcg_do_deferred_recharge(struct work_struct *work) +{ + struct deferred_recharge_work *recharge_work = container_of(work, + struct deferred_recharge_work, work); + struct folio *folio = recharge_work->folio; + struct mem_cgroup *new = recharge_work->memcg; + struct mem_cgroup *old; + + /* We are holding the last ref to the folio, let it be freed */ + if (unlikely(folio_ref_count(folio) == 1)) + goto out; + + if (!folio_isolate_lru(folio)) + goto out; + + if (unlikely(!folio_trylock(folio))) + goto out_putback; + + /* @folio was already recharged since the worker was queued? */ + if (unlikely(!should_do_deferred_recharge(folio))) + goto out_unlock; + + /* @folio was already recharged to @new and it already went offline? */ + old = folio_memcg(folio); + if (unlikely(old == new)) + goto out_unlock; + + /* + * folio_mapped() must remain stable during the move. If the folio is + * mapped, we must use rmap recharge to serialize against unmapping. + * Otherwise, if the folio is unmapped, the folio lock is held so this + * should prevent faults against the pagecache or swapcache to map it. + */ + mem_cgroup_start_move_charge(old); + if (folio_mapped(folio)) + folio_memcg_rmap_recharge(folio, new); + else + mem_cgroup_recharge_folio(folio, new); + mem_cgroup_end_move_charge(old); + +out_unlock: + folio_unlock(folio); +out_putback: + folio_putback_lru(folio); +out: + folio_put(folio); + mem_cgroup_put(new); + kfree(recharge_work); +} + +/* + * Queue deferred work to recharge @folio to current's memcg if needed. + */ +void folio_memcg_deferred_recharge(struct folio *folio) +{ + struct deferred_recharge_work *recharge_work = NULL; + struct mem_cgroup *memcg = NULL; + + /* racy check, the async worker will check again with @folio locked */ + if (likely(!should_do_deferred_recharge(folio))) + return; + + if (unlikely(!memcg_recharge_wq)) + return; + + if (unlikely(!folio_try_get(folio))) + return; + + memcg = get_mem_cgroup_from_mm(current->mm); + if (!memcg) + goto fail; + + recharge_work = kmalloc(sizeof(*recharge_work), GFP_ATOMIC); + if (!recharge_work) + goto fail; + + /* we hold refs to both the folio and the memcg we are charging to */ + recharge_work->folio = folio; + recharge_work->memcg = memcg; + INIT_WORK(&recharge_work->work, folio_memcg_do_deferred_recharge); + queue_work(memcg_recharge_wq, &recharge_work->work); + return; +fail: + kfree(recharge_work); + mem_cgroup_put(memcg); + folio_put(folio); +} + #ifdef CONFIG_LRU_GEN static void mem_cgroup_attach(struct cgroup_taskset *tset) {