From patchwork Wed May 4 21:44:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12838664 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D01DC433F5 for ; Wed, 4 May 2022 21:45:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCA7E6B0074; Wed, 4 May 2022 17:45:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C78BD6B0075; Wed, 4 May 2022 17:45:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B40656B0078; Wed, 4 May 2022 17:45:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A7DD76B0074 for ; Wed, 4 May 2022 17:45:06 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8A229608DE for ; Wed, 4 May 2022 21:45:06 +0000 (UTC) X-FDA: 79429391412.29.17E6ACB Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf13.hostedemail.com (Postfix) with ESMTP id B2D0220089 for ; Wed, 4 May 2022 21:44:53 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id d64-20020a17090a6f4600b001da3937032fso3496006pjk.5 for ; Wed, 04 May 2022 14:45:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=yZ31JGTvgfOlGTmtkhpjekkkxapYjOkK+nff1sh1eRM=; b=chbJhrF9DlqD9ZKgf0v7IcrRKMQbrKWUZRqlI610pu7VUgEmZBIwnfP0albDK5OUBA Chu9K3KDdWgwZqqvVoa0+BMBzUM9DiWZ6aNDTZmcl9EmFeuMNwryfld9EjCTSOex3bE+ 1x6MCVhe3FRpZhvMrO6OV+I0z3scxywHsGMhDK5cgxnbMXg3Gu56ik762hwBd+xRDP/S TLKq2+E+RODrpGNxBqhGLa2lAiJENTPrklbgN3x8RQjdbxVABv2vTjiZt66Aw1N4ENMr 5zp0+OOn/tU6gchm/WSMGmb601cxTCDbCgdCE51DaH/Ae5TY5+3SNagup6tVqYoduy3U qgbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=yZ31JGTvgfOlGTmtkhpjekkkxapYjOkK+nff1sh1eRM=; b=uCpc46526i+rg7NX3Xup/t99ioI9b2arUUdSWBbM5EscTxu5fTH3w5R9zQDY1LKIea Wejo/CnYY48b0lHMWRPGQstvgcq7jhxn9xKaCSF+lLU1/aO+8yJZBFxJO0JEjV58uHYS RCmaIiGE+S0z/2S9AzJNdKdxEho1WcxU7f2Lc18TyUHr1Vme964eYY8I0E5/wt7x4gyL Xxq3ozLTEDEWYfG6MSSCLJwkMivTx6dRb65KhjqcJ7bHjMz8xnJI8iDjicf6N8NLrILk tbFW5FBKVSEw2VqQpuCZPqy2Skb57zP/6W+z788/wI8w6zDmuRYu5XmHAKgh9yOpkVfP Fe5Q== X-Gm-Message-State: AOAM533t0UtTZub+hKsZ++5ysd8BourcfFXQ4aebGcm3WLJyV7LpTySa 6SXLl8y86ty661+pYCSmHjdbxV/eHxZ2 X-Google-Smtp-Source: ABdhPJzstN4c2mKX9U/geFdIC7UB+h9d/tlTfYcE+ARPgx4w6AmIC3slfUa1u7ARdskzQ61CA1xZ5Di9LvB+ X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90a:c986:b0:1d9:56e7:4e83 with SMTP id w6-20020a17090ac98600b001d956e74e83mr121566pjt.1.1651700704549; Wed, 04 May 2022 14:45:04 -0700 (PDT) Date: Wed, 4 May 2022 14:44:27 -0700 In-Reply-To: <20220504214437.2850685-1-zokeefe@google.com> Message-Id: <20220504214437.2850685-4-zokeefe@google.com> Mime-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v5 03/13] mm/khugepaged: dedup and simplify hugepage alloc and charging From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=chbJhrF9; spf=pass (imf13.hostedemail.com: domain of 34PNyYgcKCK0mbXRRSRTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=34PNyYgcKCK0mbXRRSRTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B2D0220089 X-Stat-Signature: riq1pe9zwaaugzbw5zfy697cdgykonrx X-HE-Tag: 1651700693-490693 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The following code is duplicated in collapse_huge_page() and collapse_file(): /* Only allocate from the target node */ gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; new_page = khugepaged_alloc_page(hpage, gfp, node); if (!new_page) { result = SCAN_ALLOC_HUGE_PAGE_FAIL; goto out; } if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { result = SCAN_CGROUP_CHARGE_FAIL; goto out; } count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); Also, "node" is passed as an argument to both collapse_huge_page() and collapse_file() and obtained the same way, via khugepaged_find_target_node(). Move all this into a new helper, alloc_charge_hpage(), and remove the duplicate code from collapse_huge_page() and collapse_file(). Also, simplify khugepaged_alloc_page() by returning a bool indicating allocation success instead of a copy of the (possibly) allocated struct page. Suggested-by: Peter Xu Signed-off-by: Zach O'Keefe --- This patch currently depends on 'mm/khugepaged: sched to numa node when collapse huge page' currently being discussed upstream[1], and anticipates that this functionality would be equally applicable to file-backed collapse. It also goes ahead and wraps this code in a CONFIF_NUMA #ifdef. [1] https://lore.kernel.org/linux-mm/20220317065024.2635069-1-maobibo@loongson.cn/ mm/khugepaged.c | 99 +++++++++++++++++++++++-------------------------- 1 file changed, 46 insertions(+), 53 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index d3cb670921cd..c94bc43dff3e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -866,8 +866,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static struct page * -khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) { VM_BUG_ON_PAGE(*hpage, *hpage); @@ -875,12 +874,12 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) if (unlikely(!*hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); *hpage = ERR_PTR(-ENOMEM); - return NULL; + return false; } prep_transhuge_page(*hpage); count_vm_event(THP_COLLAPSE_ALLOC); - return *hpage; + return true; } #else static int khugepaged_find_target_node(struct collapse_control *cc) @@ -942,12 +941,11 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static struct page * -khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) { VM_BUG_ON(!*hpage); - return *hpage; + return true; } #endif @@ -1069,10 +1067,34 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, return true; } -static void collapse_huge_page(struct mm_struct *mm, - unsigned long address, - struct page **hpage, - int node, int referenced, int unmapped) +static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, + struct collapse_control *cc) +{ +#ifdef CONFIG_NUMA + const struct cpumask *cpumask; +#endif + gfp_t gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; + int node = khugepaged_find_target_node(cc); + +#ifdef CONFIG_NUMA + /* sched to specified node before huge page memory copy */ + if (task_node(current) != node) { + cpumask = cpumask_of_node(node); + if (!cpumask_empty(cpumask)) + set_cpus_allowed_ptr(current, cpumask); + } +#endif + if (!khugepaged_alloc_page(hpage, gfp, node)) + return SCAN_ALLOC_HUGE_PAGE_FAIL; + if (unlikely(mem_cgroup_charge(page_folio(*hpage), mm, gfp))) + return SCAN_CGROUP_CHARGE_FAIL; + count_memcg_page_event(*hpage, THP_COLLAPSE_ALLOC); + return SCAN_SUCCEED; +} + +static void collapse_huge_page(struct mm_struct *mm, unsigned long address, + struct page **hpage, int referenced, + int unmapped, struct collapse_control *cc) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; @@ -1083,14 +1105,9 @@ static void collapse_huge_page(struct mm_struct *mm, int isolated = 0, result = 0; struct vm_area_struct *vma; struct mmu_notifier_range range; - gfp_t gfp; - const struct cpumask *cpumask; VM_BUG_ON(address & ~HPAGE_PMD_MASK); - /* Only allocate from the target node */ - gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; - /* * Before allocating the hugepage, release the mmap_lock read lock. * The allocation can take potentially a long time if it involves @@ -1099,23 +1116,11 @@ static void collapse_huge_page(struct mm_struct *mm, */ mmap_read_unlock(mm); - /* sched to specified node before huage page memory copy */ - if (task_node(current) != node) { - cpumask = cpumask_of_node(node); - if (!cpumask_empty(cpumask)) - set_cpus_allowed_ptr(current, cpumask); - } - new_page = khugepaged_alloc_page(hpage, gfp, node); - if (!new_page) { - result = SCAN_ALLOC_HUGE_PAGE_FAIL; + result = alloc_charge_hpage(hpage, mm, cc); + if (result != SCAN_SUCCEED) goto out_nolock; - } - if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { - result = SCAN_CGROUP_CHARGE_FAIL; - goto out_nolock; - } - count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); + new_page = *hpage; mmap_read_lock(mm); result = hugepage_vma_revalidate(mm, address, &vma); @@ -1386,10 +1391,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, out_unmap: pte_unmap_unlock(pte, ptl); if (ret) { - node = khugepaged_find_target_node(cc); /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, hpage, node, - referenced, unmapped); + collapse_huge_page(mm, address, hpage, referenced, unmapped, + cc); } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, @@ -1655,7 +1659,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * @file: file that collapse on * @start: collapse start address * @hpage: new allocated huge page for collapse - * @node: appointed node the new huge page allocate from + * @cc: collapse context and scratchpad * * Basic scheme is simple, details are more complex: * - allocate and lock a new huge page; @@ -1672,12 +1676,11 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * + restore gaps in the page cache; * + unlock and free huge page; */ -static void collapse_file(struct mm_struct *mm, - struct file *file, pgoff_t start, - struct page **hpage, int node) +static void collapse_file(struct mm_struct *mm, struct file *file, + pgoff_t start, struct page **hpage, + struct collapse_control *cc) { struct address_space *mapping = file->f_mapping; - gfp_t gfp; struct page *new_page; pgoff_t index, end = start + HPAGE_PMD_NR; LIST_HEAD(pagelist); @@ -1689,20 +1692,11 @@ static void collapse_file(struct mm_struct *mm, VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); - /* Only allocate from the target node */ - gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; - - new_page = khugepaged_alloc_page(hpage, gfp, node); - if (!new_page) { - result = SCAN_ALLOC_HUGE_PAGE_FAIL; + result = alloc_charge_hpage(hpage, mm, cc); + if (result != SCAN_SUCCEED) goto out; - } - if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { - result = SCAN_CGROUP_CHARGE_FAIL; - goto out; - } - count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); + new_page = *hpage; /* * Ensure we have slots for all the pages in the range. This is @@ -2112,8 +2106,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - node = khugepaged_find_target_node(cc); - collapse_file(mm, file, start, hpage, node); + collapse_file(mm, file, start, hpage, cc); } }