From patchwork Mon May 2 18:17:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12834608 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3253EC433EF for ; Mon, 2 May 2022 18:17:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C2A416B0078; Mon, 2 May 2022 14:17:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B6B1A6B007B; Mon, 2 May 2022 14:17:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B8666B007D; Mon, 2 May 2022 14:17:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 8506D6B0078 for ; Mon, 2 May 2022 14:17:35 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 5C62B2A722 for ; Mon, 2 May 2022 18:17:35 +0000 (UTC) X-FDA: 79421610870.26.7C03B42 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf25.hostedemail.com (Postfix) with ESMTP id 39864A0084 for ; Mon, 2 May 2022 18:17:23 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id u18-20020a170902e21200b0015e5e660618so5477997plb.5 for ; Mon, 02 May 2022 11:17:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=t7F8zx0lTHKtnmz2meJgDJpH8LMEXALAL9De80QPgJ8=; b=pzkKIvETyGo2OGGbA6/ejNkOyP7Z+64Qx/HLjbCHew4dh+r0L/eC4B+omLFuoPGmiM u6GyWBOsZVDk2MFSV/lrUA1V81+8HPqh7PUz2hx3WKiyWQsv0qtzYavZGq1mADS02l4h Je8cjVeQ9g5iNCaMzgVCSMLDiJGEqk7PhrObtC2E1TqjN2x7JKf+jP5RiTAJLCmZuB1y y0dLTILK5/yEs4CrUkYqNXHeX1npKPsJwU5ZO7JhVzeSXw0T8JsegMjhPba31Uq6hyZ7 l7o3gy0JtK06mJEthkvb/8hho0c6zju1PwdHDl5d9JLcA6uqcc/WdDfi1xZ0AGaHLqe4 9GdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=t7F8zx0lTHKtnmz2meJgDJpH8LMEXALAL9De80QPgJ8=; b=CQsEpJWhqb7CSdyyLHY2axlkgNS3q+SZv3wO/gSy/YCAKUX7/7Nv0JHVQKd0otwTTN s2Kfl65FPHW0Ds0KfSDzDF1+uXZDHoVbManZ3BFSjnuT4c7ffdnRdwAJzVBIP/sQqQ8C MBSmlywUmMN/Fi17bA+yEp7RKWmOFw+0EqPjj9VIJgQMSMb6vPX9l4FoJ908LzIv++JI Fz4GFR9CBPhRcxhU/uyhHWvRD45BJD03roSoumTEkAIl4GZr/MDjH4yCo5VbkC0GCzwQ T5NOFaioKbzJ/9qyE5xKRb0vaKM3rkirtlQoTC9xh/0VPBbcHkRqRoTyOkg6rFITQQeG AoLw== X-Gm-Message-State: AOAM532oyLpk1oBZXAhGfH55ZkGlPDHTcpXtaT9SoNsJloRWbGw60J4t 9qjqZNpJuUnRf9nmeChHpbZOG1dLv/Db X-Google-Smtp-Source: ABdhPJw/O1cd09WN7dvLu2HlurJ3kypoDssY6bMOMGcmFL6dlGAdsw4IIr6B+jQeurWM2S5S60aUSJIBejco X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:17c9:b0:1da:4359:768b with SMTP id me9-20020a17090b17c900b001da4359768bmr496604pjb.22.1651515453962; Mon, 02 May 2022 11:17:33 -0700 (PDT) Date: Mon, 2 May 2022 11:17:05 -0700 In-Reply-To: <20220502181714.3483177-1-zokeefe@google.com> Message-Id: <20220502181714.3483177-5-zokeefe@google.com> Mime-Version: 1.0 References: <20220502181714.3483177-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v4 04/13] mm/khugepaged: make hugepage allocation context-specific From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 39864A0084 X-Stat-Signature: aeby6yat5pus11fjfpdfz874knhosk6n Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=pzkKIvET; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of 3PSBwYgcKCFgPEA44546EE6B4.2ECB8DKN-CCAL02A.EH6@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3PSBwYgcKCFgPEA44546EE6B4.2ECB8DKN-CCAL02A.EH6@flex--zokeefe.bounces.google.com X-Rspam-User: X-HE-Tag: 1651515443-530221 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a hook to struct collapse_context that allows contexts to define their own allocation semantics and charging logic. For example, khugepaged has specific NUMA and UMA implementations as well as gfp flags tied to /sys/kernel/mm/transparent_hugepage/khugepaged/defrag. Additionally, move [pre]allocated hugepage pointer into struct collapse_context. Signed-off-by: Zach O'Keefe Reported-by: kernel test robot Reported-by: kernel test robot Reported-by: kernel test robot --- mm/khugepaged.c | 85 ++++++++++++++++++++++++------------------------- 1 file changed, 42 insertions(+), 43 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b05fb9a85eab..755c40fe87d2 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -92,6 +92,10 @@ struct collapse_control { /* Last target selected in khugepaged_find_target_node() */ int last_target_node; + + struct page *hpage; + int (*alloc_charge_hpage)(struct mm_struct *mm, + struct collapse_control *cc); }; /** @@ -866,18 +870,19 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static bool khugepaged_alloc_page(gfp_t gfp, int node, + struct collapse_control *cc) { - VM_BUG_ON_PAGE(*hpage, *hpage); + VM_BUG_ON_PAGE(cc->hpage, cc->hpage); - *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); - if (unlikely(!*hpage)) { + cc->hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); + if (unlikely(!cc->hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); - *hpage = ERR_PTR(-ENOMEM); + cc->hpage = ERR_PTR(-ENOMEM); return false; } - prep_transhuge_page(*hpage); + prep_transhuge_page(cc->hpage); count_vm_event(THP_COLLAPSE_ALLOC); return true; } @@ -1067,8 +1072,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, return true; } -static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, - struct collapse_control *cc) +static int alloc_charge_hpage(struct mm_struct *mm, struct collapse_control *cc) { #ifdef CONFIG_NUMA const struct cpumask *cpumask; @@ -1084,17 +1088,17 @@ static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, set_cpus_allowed_ptr(current, cpumask); } #endif - if (!khugepaged_alloc_page(hpage, gfp, node)) + if (!khugepaged_alloc_page(gfp, node, cc)) return SCAN_ALLOC_HUGE_PAGE_FAIL; - if (unlikely(mem_cgroup_charge(page_folio(*hpage), mm, gfp))) + if (unlikely(mem_cgroup_charge(page_folio(cc->hpage), mm, gfp))) return SCAN_CGROUP_CHARGE_FAIL; - count_memcg_page_event(*hpage, THP_COLLAPSE_ALLOC); + count_memcg_page_event(cc->hpage, THP_COLLAPSE_ALLOC); return SCAN_SUCCEED; } static void collapse_huge_page(struct mm_struct *mm, unsigned long address, - struct page **hpage, int referenced, - int unmapped, struct collapse_control *cc) + int referenced, int unmapped, + struct collapse_control *cc) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; @@ -1116,11 +1120,11 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, */ mmap_read_unlock(mm); - result = alloc_charge_hpage(hpage, mm, cc); + result = cc->alloc_charge_hpage(mm, cc); if (result != SCAN_SUCCEED) goto out_nolock; - new_page = *hpage; + new_page = cc->hpage; mmap_read_lock(mm); result = hugepage_vma_revalidate(mm, address, &vma); @@ -1232,15 +1236,15 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, update_mmu_cache_pmd(vma, address, pmd); spin_unlock(pmd_ptl); - *hpage = NULL; + cc->hpage = NULL; khugepaged_pages_collapsed++; result = SCAN_SUCCEED; out_up_write: mmap_write_unlock(mm); out_nolock: - if (!IS_ERR_OR_NULL(*hpage)) - mem_cgroup_uncharge(page_folio(*hpage)); + if (!IS_ERR_OR_NULL(cc->hpage)) + mem_cgroup_uncharge(page_folio(cc->hpage)); trace_mm_collapse_huge_page(mm, isolated, result); return; } @@ -1248,7 +1252,6 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, - struct page **hpage, struct collapse_control *cc) { pmd_t *pmd; @@ -1394,8 +1397,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, pte_unmap_unlock(pte, ptl); if (ret) { /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, hpage, referenced, unmapped, - cc); + collapse_huge_page(mm, address, referenced, unmapped, cc); } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, @@ -1660,7 +1662,6 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * @mm: process address space where collapse happens * @file: file that collapse on * @start: collapse start address - * @hpage: new allocated huge page for collapse * @cc: collapse context and scratchpad * * Basic scheme is simple, details are more complex: @@ -1679,8 +1680,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * + unlock and free huge page; */ static void collapse_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct page **hpage, - struct collapse_control *cc) + pgoff_t start, struct collapse_control *cc) { struct address_space *mapping = file->f_mapping; struct page *new_page; @@ -1694,11 +1694,11 @@ static void collapse_file(struct mm_struct *mm, struct file *file, VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); - result = alloc_charge_hpage(hpage, mm, cc); + result = cc->alloc_charge_hpage(mm, cc); if (result != SCAN_SUCCEED) goto out; - new_page = *hpage; + new_page = cc->hpage; /* * Ensure we have slots for all the pages in the range. This is @@ -1981,7 +1981,7 @@ static void collapse_file(struct mm_struct *mm, struct file *file, * Remove pte page tables, so we can re-fault the page as huge. */ retract_page_tables(mapping, start); - *hpage = NULL; + cc->hpage = NULL; khugepaged_pages_collapsed++; } else { @@ -2028,14 +2028,14 @@ static void collapse_file(struct mm_struct *mm, struct file *file, unlock_page(new_page); out: VM_BUG_ON(!list_empty(&pagelist)); - if (!IS_ERR_OR_NULL(*hpage)) - mem_cgroup_uncharge(page_folio(*hpage)); + if (!IS_ERR_OR_NULL(cc->hpage)) + mem_cgroup_uncharge(page_folio(cc->hpage)); /* TODO: tracepoints */ } static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage, - struct collapse_control *cc) + struct file *file, pgoff_t start, + struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -2108,7 +2108,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - collapse_file(mm, file, start, hpage, cc); + collapse_file(mm, file, start, cc); } } @@ -2116,8 +2116,8 @@ static void khugepaged_scan_file(struct mm_struct *mm, } #else static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage, - struct collapse_control *cc) + struct file *file, pgoff_t start, + struct collapse_control *cc) { BUILD_BUG(); } @@ -2128,7 +2128,6 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, - struct page **hpage, struct collapse_control *cc) __releases(&khugepaged_mm_lock) __acquires(&khugepaged_mm_lock) @@ -2205,12 +2204,11 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, mmap_read_unlock(mm); ret = 1; - khugepaged_scan_file(mm, file, pgoff, hpage, cc); + khugepaged_scan_file(mm, file, pgoff, cc); fput(file); } else { ret = khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, - hpage, cc); + khugepaged_scan.address, cc); } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; @@ -2268,15 +2266,15 @@ static int khugepaged_wait_event(void) static void khugepaged_do_scan(struct collapse_control *cc) { - struct page *hpage = NULL; unsigned int progress = 0, pass_through_head = 0; unsigned int pages = READ_ONCE(khugepaged_pages_to_scan); bool wait = true; + cc->hpage = NULL; lru_add_drain_all(); while (progress < pages) { - if (!khugepaged_prealloc_page(&hpage, &wait)) + if (!khugepaged_prealloc_page(&cc->hpage, &wait)) break; cond_resched(); @@ -2290,14 +2288,14 @@ static void khugepaged_do_scan(struct collapse_control *cc) if (khugepaged_has_work() && pass_through_head < 2) progress += khugepaged_scan_mm_slot(pages - progress, - &hpage, cc); + cc); else progress = pages; spin_unlock(&khugepaged_mm_lock); } - if (!IS_ERR_OR_NULL(hpage)) - put_page(hpage); + if (!IS_ERR_OR_NULL(cc->hpage)) + put_page(cc->hpage); } static bool khugepaged_should_wakeup(void) @@ -2331,6 +2329,7 @@ static int khugepaged(void *none) struct mm_slot *mm_slot; struct collapse_control cc = { .last_target_node = NUMA_NO_NODE, + .alloc_charge_hpage = &alloc_charge_hpage, }; set_freezable();