From patchwork Fri Feb 19 22:44:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shakeel Butt X-Patchwork-Id: 12096313 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35540C433DB for ; Fri, 19 Feb 2021 22:44:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 88C4364E67 for ; Fri, 19 Feb 2021 22:44:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 88C4364E67 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E36246B0005; Fri, 19 Feb 2021 17:44:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DBF0D6B006C; Fri, 19 Feb 2021 17:44:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C882A6B006E; Fri, 19 Feb 2021 17:44:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0075.hostedemail.com [216.40.44.75]) by kanga.kvack.org (Postfix) with ESMTP id A9D306B0005 for ; Fri, 19 Feb 2021 17:44:22 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 64A34998C for ; Fri, 19 Feb 2021 22:44:22 +0000 (UTC) X-FDA: 77836497564.27.C3A3855 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf24.hostedemail.com (Postfix) with ESMTP id 1C2DDA0009D9 for ; Fri, 19 Feb 2021 22:44:17 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id d21so1140955pld.5 for ; Fri, 19 Feb 2021 14:44:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:message-id:mime-version:subject:from:to:cc; bh=f+jPsQm7BQbbCo3vZbL1OIRiq8am17ZcJ9XzWa+2WI0=; b=H8dGYbZjmSEcgf2ZFtrhHydNxDKneTbq2NyanqnRshN8oqRuHArlapIsKvsM3nLFMB jZizm/vpQtvejkMNhVBGH2QqGJeGG1J+35iXvPnRphx5n4SaiEwDB9YXYnGpFXJ88ph6 Gg/RDU4s8nOnELJpojDs3Zg+OZGB2ZL5RhcQfW9J2SIqPeJxxFkFQzW4aiXh5g7pOmAw HVD6A+Me99UYg5aa4M9Jut7mX4HclsMDE7GRRvlahTtrS7sNrIMvCbpLlauhqGYg77PZ o7h7FceX5Q4ZvLtjyH2CQdeqYaoo78taVIG17/20TP2Pk7vkDOcK4RHNnrcRzxdYdQLJ 6AFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:message-id:mime-version:subject:from :to:cc; bh=f+jPsQm7BQbbCo3vZbL1OIRiq8am17ZcJ9XzWa+2WI0=; b=pjobKcE/tFc9Fx8InouhLis+GZNsSxvFb8PeLJBph/XTVOg9XPGE6+xIlSAj2nn8tL Gj1K2ldjHqclVd6+gb3giuvC7pZJmEM+F8YdEBmYr9VoVc+cDYbICF7jZq6k16zf72hW iYSAXwyYcjCgbfHnnxWMrdJpgXhfL0eehK5Ly+K/64shwEpwQwxnG3WsjoqWiM8zngQF mOCUpYiCquEq2PScBq4gv8TkSgBX1EvmUGp+4VkEcIIkD+P2Z3q5zMgEADd9znqeFnLc PZKjfPGfqnPHRGlegPFPBCmi7tcrFGTDL/C+25jxvgcBvOXL9ZzLKu61Kj0WUh3aXs5t PpTg== X-Gm-Message-State: AOAM532W/R55Xy6YTt8khFFVwmJZh2GgEs+9/WhNCiMOrqHy+9/KYFsi WW+YwXA7AXm5T3PEcOUj5t3eSAdDw2xNFQ== X-Google-Smtp-Source: ABdhPJzOY/LGfPTJMuc746ADMYrZhI5cWa2Y/nkf8ISLm6qcoSmc90IgH23Vd00INhPgX1DeBB19QPREiOwM2w== X-Received: from shakeelb.svl.corp.google.com ([2620:15c:2cd:202:952a:270c:66f7:6213]) (user=shakeelb job=sendgmr) by 2002:a63:1f10:: with SMTP id f16mr10154398pgf.111.1613774660515; Fri, 19 Feb 2021 14:44:20 -0800 (PST) Date: Fri, 19 Feb 2021 14:44:05 -0800 Message-Id: <20210219224405.1544597-1-shakeelb@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.30.0.617.g56c4b15f3c-goog Subject: [PATCH] memcg: charge before adding to swapcache on swapin From: Shakeel Butt To: Hugh Dickins , Johannes Weiner Cc: Roman Gushchin , Michal Hocko , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shakeel Butt X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 1C2DDA0009D9 X-Stat-Signature: pzuwc9o4u9m8g9xm7rdjxthc9upnnfhw Received-SPF: none (flex--shakeelb.bounces.google.com>: No applicable sender policy available) receiver=imf24; identity=mailfrom; envelope-from="<3RD8wYAgKCIs7wpztt0qv33v0t.r310x29C-11zAprz.36v@flex--shakeelb.bounces.google.com>"; helo=mail-pl1-f201.google.com; client-ip=209.85.214.201 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613774657-878212 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently the kernel adds the page, allocated for swapin, to the swapcache before charging the page. This is fine but now we want a per-memcg swapcache stat which is essential for folks who wants to transparently migrate from cgroup v1's memsw to cgroup v2's memory and swap counters. To correctly maintain the per-memcg swapcache stat, one option which this patch has adopted is to charge the page before adding it to swapcache. One challenge in this option is the failure case of add_to_swap_cache() on which we need to undo the mem_cgroup_charge(). Specifically undoing mem_cgroup_uncharge_swap() is not simple. This patch circumvent this specific issue by removing the failure path of add_to_swap_cache() by providing __GFP_NOFAIL. Please note that in this specific situation ENOMEM was the only possible failure of add_to_swap_cache() which is removed by using __GFP_NOFAIL. Another option was to use __mod_memcg_lruvec_state(NR_SWAPCACHE) in mem_cgroup_charge() but then we need to take of the do_swap_page() case where synchronous swap devices bypass the swapcache. The do_swap_page() already does hackery to set and reset PageSwapCache bit to make mem_cgroup_charge() execute the swap accounting code and then we would need to add additional parameter to tell to not touch NR_SWAPCACHE stat as that code patch bypass swapcache. This patch added memcg charging API explicitly foe swapin pages and cleaned up do_swap_page() to not set and reset PageSwapCache bit. Signed-off-by: Shakeel Butt --- Andrew, please couple this patch with "mm: memcg: add swapcache stat for memcg v2" patch and there is no urgency for these two to be in 5.12. include/linux/memcontrol.h | 8 +++ mm/memcontrol.c | 100 ++++++++++++++++++++++--------------- mm/memory.c | 8 +-- mm/swap_state.c | 16 +++--- scripts/cc-version.sh | 0 5 files changed, 77 insertions(+), 55 deletions(-) mode change 100644 => 100755 scripts/cc-version.sh diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e6dc793d587d..f3af65caddc6 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -596,6 +596,8 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg) } int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask); +int mem_cgroup_charge_swapin_page(struct page *page, struct mm_struct *mm, + gfp_t gfp, swp_entry_t entry); void mem_cgroup_uncharge(struct page *page); void mem_cgroup_uncharge_list(struct list_head *page_list); @@ -1141,6 +1143,12 @@ static inline int mem_cgroup_charge(struct page *page, struct mm_struct *mm, return 0; } +static inline int mem_cgroup_charge_swapin_page(struct page *page, + struct mm_struct *mm, gfp_t gfp, swp_entry_t entry); +{ + return 0; +} + static inline void mem_cgroup_uncharge(struct page *page) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2db2aeac8a9e..a0ad7682f28e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6690,6 +6690,27 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root, atomic_long_read(&parent->memory.children_low_usage))); } +static int __mem_cgroup_charge(struct page *page, struct mem_cgroup *memcg, + gfp_t gfp) +{ + unsigned int nr_pages = thp_nr_pages(page); + int ret; + + ret = try_charge(memcg, gfp, nr_pages); + if (ret) + goto out; + + css_get(&memcg->css); + commit_charge(page, memcg); + + local_irq_disable(); + mem_cgroup_charge_statistics(memcg, page, nr_pages); + memcg_check_events(memcg, page); + local_irq_enable(); +out: + return ret; +} + /** * mem_cgroup_charge - charge a newly allocated page to a cgroup * @page: page to charge @@ -6699,54 +6720,54 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root, * Try to charge @page to the memcg that @mm belongs to, reclaiming * pages according to @gfp_mask if necessary. * + * Do not use this for pages allocated for swapin. + * * Returns 0 on success. Otherwise, an error code is returned. */ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) { - unsigned int nr_pages = thp_nr_pages(page); - struct mem_cgroup *memcg = NULL; - int ret = 0; + struct mem_cgroup *memcg; + int ret; if (mem_cgroup_disabled()) - goto out; - - if (PageSwapCache(page)) { - swp_entry_t ent = { .val = page_private(page), }; - unsigned short id; + return 0; - /* - * Every swap fault against a single page tries to charge the - * page, bail as early as possible. shmem_unuse() encounters - * already charged pages, too. page and memcg binding is - * protected by the page lock, which serializes swap cache - * removal, which in turn serializes uncharging. - */ - VM_BUG_ON_PAGE(!PageLocked(page), page); - if (page_memcg(compound_head(page))) - goto out; + memcg = get_mem_cgroup_from_mm(mm); + ret = __mem_cgroup_charge(page, memcg, gfp_mask); + css_put(&memcg->css); - id = lookup_swap_cgroup_id(ent); - rcu_read_lock(); - memcg = mem_cgroup_from_id(id); - if (memcg && !css_tryget_online(&memcg->css)) - memcg = NULL; - rcu_read_unlock(); - } + return ret; +} - if (!memcg) - memcg = get_mem_cgroup_from_mm(mm); +/** + * mem_cgroup_charge_swapin_page - charge a newly allocated page for swapin + * @page: page to charge + * @mm: mm context of the victim + * @gfp: reclaim mode + * @entry: swap entry for which the page is allocated + * + * This is similar to mem_cgroup_charge() but only pages allocated for swapin. + * + * Returns 0 on success. Otherwise, an error code is returned. + */ +int mem_cgroup_charge_swapin_page(struct page *page, struct mm_struct *mm, + gfp_t gfp, swp_entry_t entry) +{ + struct mem_cgroup *memcg; + unsigned short id; + int ret; - ret = try_charge(memcg, gfp_mask, nr_pages); - if (ret) - goto out_put; + if (mem_cgroup_disabled()) + return 0; - css_get(&memcg->css); - commit_charge(page, memcg); + id = lookup_swap_cgroup_id(entry); + rcu_read_lock(); + memcg = mem_cgroup_from_id(id); + if (!memcg || !css_tryget_online(&memcg->css)) + memcg = get_mem_cgroup_from_mm(mm); + rcu_read_unlock(); - local_irq_disable(); - mem_cgroup_charge_statistics(memcg, page, nr_pages); - memcg_check_events(memcg, page); - local_irq_enable(); + ret = __mem_cgroup_charge(page, memcg, gfp); /* * Cgroup1's unified memory+swap counter has been charged with the @@ -6760,19 +6781,16 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) * correspond 1:1 to page and swap slot lifetimes: we charge the * page to memory here, and uncharge swap when the slot is freed. */ - if (do_memsw_account() && PageSwapCache(page)) { - swp_entry_t entry = { .val = page_private(page) }; + if (!ret && do_memsw_account()) { /* * The swap entry might not get freed for a long time, * let's not wait for it. The page already received a * memory+swap charge, drop the swap entry duplicate. */ - mem_cgroup_uncharge_swap(entry, nr_pages); + mem_cgroup_uncharge_swap(entry, thp_nr_pages(page)); } -out_put: css_put(&memcg->css); -out: return ret; } diff --git a/mm/memory.c b/mm/memory.c index c8e357627318..fb39af50f62d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3311,13 +3311,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) __SetPageLocked(page); __SetPageSwapBacked(page); - set_page_private(page, entry.val); - /* Tell memcg to use swap ownership records */ - SetPageSwapCache(page); - err = mem_cgroup_charge(page, vma->vm_mm, - GFP_KERNEL); - ClearPageSwapCache(page); + err = mem_cgroup_charge_swapin_page(page, + vma->vm_mm, GFP_KERNEL, entry); if (err) { ret = VM_FAULT_OOM; goto out_page; diff --git a/mm/swap_state.c b/mm/swap_state.c index 3cdee7b11da9..816218545a48 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -497,16 +497,15 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, __SetPageLocked(page); __SetPageSwapBacked(page); - /* May fail (-ENOMEM) if XArray node allocation failed. */ - if (add_to_swap_cache(page, entry, gfp_mask & GFP_RECLAIM_MASK, &shadow)) { - put_swap_page(page, entry); + if (mem_cgroup_charge_swapin_page(page, NULL, gfp_mask, entry)) goto fail_unlock; - } - if (mem_cgroup_charge(page, NULL, gfp_mask)) { - delete_from_swap_cache(page); - goto fail_unlock; - } + /* + * Use __GFP_NOFAIL to not worry about undoing the changes done by + * mem_cgroup_charge_swapin_page() on failure of add_to_swap_cache(). + */ + add_to_swap_cache(page, entry, + (gfp_mask|__GFP_NOFAIL) & GFP_RECLAIM_MASK, &shadow); if (shadow) workingset_refault(page, shadow); @@ -517,6 +516,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, return page; fail_unlock: + put_swap_page(page, entry); unlock_page(page); put_page(page); return NULL; diff --git a/scripts/cc-version.sh b/scripts/cc-version.sh old mode 100644 new mode 100755