From patchwork Wed May 4 21:44:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12838665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3095BC433EF for ; Wed, 4 May 2022 21:45:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB81A6B0075; Wed, 4 May 2022 17:45:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B66DD6B0078; Wed, 4 May 2022 17:45:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A08AD6B007B; Wed, 4 May 2022 17:45:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 93C016B0075 for ; Wed, 4 May 2022 17:45:08 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 4A938609FC for ; Wed, 4 May 2022 21:45:08 +0000 (UTC) X-FDA: 79429391496.21.E0FC458 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf12.hostedemail.com (Postfix) with ESMTP id 98C7840025 for ; Wed, 4 May 2022 21:44:51 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id i6-20020a17090a718600b001dc87aca289so1017800pjk.5 for ; Wed, 04 May 2022 14:45:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=YV+5blkKMPhHApKELd/OUVF7srg6niYmbYw8y83Xey0=; b=kqr9O7goq1m4GzEMcswiqwNaQu7CaX+t8xrBSlg7H69MqmSoT3+dp/8Sahd4dlmOAI wMV+saEDJqRgPQQImQ0lr35ylZUvtHuwto6NGJtG8u7cnDv2zsUt4bZbfsHhY/iZZTi9 kcW74LHstIpqP/5cLBZ4yFThu0Uq70vh/0B3PYa/4ohY3e1Cn3NrwxEToZLQKpslhiyi wAgDd/LQgyaf72AbOXGj/l6kSEpl3yGlCfnRFoa6DiWELLIcVaEtUbYRphMFEyw01+5t Ywfuo2Lg3dLv/ZR9pEMBAOTQMWG18+VAQboHBZljH/Tr1741KvCVxAJwTOLqSLqTsQfT R7vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=YV+5blkKMPhHApKELd/OUVF7srg6niYmbYw8y83Xey0=; b=OHohLgaEBthAL0BgjzF2jmnoYskr/h6AGTDe34lUiJKvPtTvwVbvoKrmIm15CW9okQ SxWxd4waeBfa7Xw2rhqxlTH/7MwJ43kSRU06EFgOWt5nLW6gw5snzWesWLuU6iCZWjm2 KDoZ8HNs8Kao3pzupHNkeEFzYPp3jiB4pdotvePKNG/CpXikxycJYalp12XJ0yIYdLqu nxhFVckv6u3Wtq1sr1WbJuaqlGy9JMI0/4MCrFPZTEX6cIQZLI2LIxlNDaKcEFAXJIcz henVqMMIEqQVJhT5fNuU1pTqEv7IERG+9+mHdoSNzY9vBclZa6ZVrnAmtiqvS6SUGVOn XoGA== X-Gm-Message-State: AOAM531u+nJnYKI/7Va1DddbTly64u1dCVrudhdPpu2iMkIMnoDyT7Nr rBGpbD6LwicyhgX856/10NYrdonEzk1v X-Google-Smtp-Source: ABdhPJwo4y7+BKzZ9FtyXYSnR+dOadIveuGIUP3Dyh502NWEiLjgNV5MYcm2bOvgGLjtz62mn6BYRoUw0eCI X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:da91:b0:15e:d22f:cfd7 with SMTP id j17-20020a170902da9100b0015ed22fcfd7mr2564899plx.85.1651700706919; Wed, 04 May 2022 14:45:06 -0700 (PDT) Date: Wed, 4 May 2022 14:44:28 -0700 In-Reply-To: <20220504214437.2850685-1-zokeefe@google.com> Message-Id: <20220504214437.2850685-5-zokeefe@google.com> Mime-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v5 04/13] mm/khugepaged: make hugepage allocation context-specific From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Queue-Id: 98C7840025 X-Stat-Signature: s1unut6b7ppmrmimywgjsbuyfju735dq X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kqr9O7go; spf=pass (imf12.hostedemail.com: domain of 34vNyYgcKCK8odZTTUTVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=34vNyYgcKCK8odZTTUTVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam09 X-HE-Tag: 1651700691-297036 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a hook to struct collapse_context that allows contexts to define their own allocation semantics and charging logic. For example, khugepaged has specific NUMA and UMA implementations as well as gfp flags tied to /sys/kernel/mm/transparent_hugepage/khugepaged/defrag. Additionally, move [pre]allocated hugepage pointer into struct collapse_context. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 90 ++++++++++++++++++++++++------------------------- 1 file changed, 44 insertions(+), 46 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c94bc43dff3e..6095fcb3f07c 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -92,6 +92,10 @@ struct collapse_control { /* Last target selected in khugepaged_find_target_node() */ int last_target_node; + + struct page *hpage; + int (*alloc_charge_hpage)(struct mm_struct *mm, + struct collapse_control *cc); }; /** @@ -866,18 +870,19 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static bool khugepaged_alloc_page(gfp_t gfp, int node, + struct collapse_control *cc) { - VM_BUG_ON_PAGE(*hpage, *hpage); + VM_BUG_ON_PAGE(cc->hpage, cc->hpage); - *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); - if (unlikely(!*hpage)) { + cc->hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); + if (unlikely(!cc->hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); - *hpage = ERR_PTR(-ENOMEM); + cc->hpage = ERR_PTR(-ENOMEM); return false; } - prep_transhuge_page(*hpage); + prep_transhuge_page(cc->hpage); count_vm_event(THP_COLLAPSE_ALLOC); return true; } @@ -941,9 +946,10 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static bool khugepaged_alloc_page(gfp_t gfp, int node, + struct collapse_control *cc) { - VM_BUG_ON(!*hpage); + VM_BUG_ON(!cc->hpage); return true; } @@ -1067,8 +1073,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, return true; } -static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, - struct collapse_control *cc) +static int alloc_charge_hpage(struct mm_struct *mm, struct collapse_control *cc) { #ifdef CONFIG_NUMA const struct cpumask *cpumask; @@ -1084,17 +1089,17 @@ static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, set_cpus_allowed_ptr(current, cpumask); } #endif - if (!khugepaged_alloc_page(hpage, gfp, node)) + if (!khugepaged_alloc_page(gfp, node, cc)) return SCAN_ALLOC_HUGE_PAGE_FAIL; - if (unlikely(mem_cgroup_charge(page_folio(*hpage), mm, gfp))) + if (unlikely(mem_cgroup_charge(page_folio(cc->hpage), mm, gfp))) return SCAN_CGROUP_CHARGE_FAIL; - count_memcg_page_event(*hpage, THP_COLLAPSE_ALLOC); + count_memcg_page_event(cc->hpage, THP_COLLAPSE_ALLOC); return SCAN_SUCCEED; } static void collapse_huge_page(struct mm_struct *mm, unsigned long address, - struct page **hpage, int referenced, - int unmapped, struct collapse_control *cc) + int referenced, int unmapped, + struct collapse_control *cc) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; @@ -1116,11 +1121,11 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, */ mmap_read_unlock(mm); - result = alloc_charge_hpage(hpage, mm, cc); + result = cc->alloc_charge_hpage(mm, cc); if (result != SCAN_SUCCEED) goto out_nolock; - new_page = *hpage; + new_page = cc->hpage; mmap_read_lock(mm); result = hugepage_vma_revalidate(mm, address, &vma); @@ -1232,21 +1237,21 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, update_mmu_cache_pmd(vma, address, pmd); spin_unlock(pmd_ptl); - *hpage = NULL; + cc->hpage = NULL; khugepaged_pages_collapsed++; result = SCAN_SUCCEED; out_up_write: mmap_write_unlock(mm); out_nolock: - if (!IS_ERR_OR_NULL(*hpage)) - mem_cgroup_uncharge(page_folio(*hpage)); + if (!IS_ERR_OR_NULL(cc->hpage)) + mem_cgroup_uncharge(page_folio(cc->hpage)); trace_mm_collapse_huge_page(mm, isolated, result); return; } static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long address, struct page **hpage, + unsigned long address, struct collapse_control *cc) { pmd_t *pmd; @@ -1392,8 +1397,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, pte_unmap_unlock(pte, ptl); if (ret) { /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, hpage, referenced, unmapped, - cc); + collapse_huge_page(mm, address, referenced, unmapped, cc); } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, @@ -1658,7 +1662,6 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * @mm: process address space where collapse happens * @file: file that collapse on * @start: collapse start address - * @hpage: new allocated huge page for collapse * @cc: collapse context and scratchpad * * Basic scheme is simple, details are more complex: @@ -1677,8 +1680,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * + unlock and free huge page; */ static void collapse_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct page **hpage, - struct collapse_control *cc) + pgoff_t start, struct collapse_control *cc) { struct address_space *mapping = file->f_mapping; struct page *new_page; @@ -1692,11 +1694,11 @@ static void collapse_file(struct mm_struct *mm, struct file *file, VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); - result = alloc_charge_hpage(hpage, mm, cc); + result = cc->alloc_charge_hpage(mm, cc); if (result != SCAN_SUCCEED) goto out; - new_page = *hpage; + new_page = cc->hpage; /* * Ensure we have slots for all the pages in the range. This is @@ -1979,7 +1981,7 @@ static void collapse_file(struct mm_struct *mm, struct file *file, * Remove pte page tables, so we can re-fault the page as huge. */ retract_page_tables(mapping, start); - *hpage = NULL; + cc->hpage = NULL; khugepaged_pages_collapsed++; } else { @@ -2026,14 +2028,13 @@ static void collapse_file(struct mm_struct *mm, struct file *file, unlock_page(new_page); out: VM_BUG_ON(!list_empty(&pagelist)); - if (!IS_ERR_OR_NULL(*hpage)) - mem_cgroup_uncharge(page_folio(*hpage)); + if (!IS_ERR_OR_NULL(cc->hpage)) + mem_cgroup_uncharge(page_folio(cc->hpage)); /* TODO: tracepoints */ } static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct page **hpage, - struct collapse_control *cc) + pgoff_t start, struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -2106,7 +2107,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - collapse_file(mm, file, start, hpage, cc); + collapse_file(mm, file, start, cc); } } @@ -2114,8 +2115,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, } #else static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct page **hpage, - struct collapse_control *cc) + pgoff_t start, struct collapse_control *cc) { BUILD_BUG(); } @@ -2126,7 +2126,6 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, - struct page **hpage, struct collapse_control *cc) __releases(&khugepaged_mm_lock) __acquires(&khugepaged_mm_lock) @@ -2203,13 +2202,11 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, mmap_read_unlock(mm); ret = 1; - khugepaged_scan_file(mm, file, pgoff, hpage, - cc); + khugepaged_scan_file(mm, file, pgoff, cc); fput(file); } else { ret = khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, - hpage, cc); + khugepaged_scan.address, cc); } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; @@ -2267,15 +2264,15 @@ static int khugepaged_wait_event(void) static void khugepaged_do_scan(struct collapse_control *cc) { - struct page *hpage = NULL; unsigned int progress = 0, pass_through_head = 0; unsigned int pages = READ_ONCE(khugepaged_pages_to_scan); bool wait = true; + cc->hpage = NULL; lru_add_drain_all(); while (progress < pages) { - if (!khugepaged_prealloc_page(&hpage, &wait)) + if (!khugepaged_prealloc_page(&cc->hpage, &wait)) break; cond_resched(); @@ -2289,14 +2286,14 @@ static void khugepaged_do_scan(struct collapse_control *cc) if (khugepaged_has_work() && pass_through_head < 2) progress += khugepaged_scan_mm_slot(pages - progress, - &hpage, cc); + cc); else progress = pages; spin_unlock(&khugepaged_mm_lock); } - if (!IS_ERR_OR_NULL(hpage)) - put_page(hpage); + if (!IS_ERR_OR_NULL(cc->hpage)) + put_page(cc->hpage); } static bool khugepaged_should_wakeup(void) @@ -2330,6 +2327,7 @@ static int khugepaged(void *none) struct mm_slot *mm_slot; struct collapse_control cc = { .last_target_node = NUMA_NO_NODE, + .alloc_charge_hpage = &alloc_charge_hpage, }; set_freezable();