From patchwork Wed Jul 6 23:59:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908930 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9CC4CCA47C for ; Thu, 7 Jul 2022 00:06:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 15BD28E0006; Wed, 6 Jul 2022 20:06:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E5908E0001; Wed, 6 Jul 2022 20:06:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E7D1D8E0006; Wed, 6 Jul 2022 20:06:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C9B748E0001 for ; Wed, 6 Jul 2022 20:06:15 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A169660BF1 for ; Thu, 7 Jul 2022 00:06:15 +0000 (UTC) X-FDA: 79658361510.18.3602287 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf31.hostedemail.com (Postfix) with ESMTP id 3A3E620022 for ; Thu, 7 Jul 2022 00:06:15 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id 5-20020a620605000000b00527ca01f8a3so5231770pfg.19 for ; Wed, 06 Jul 2022 17:06:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=nGPRm3jBy7CPW6XxWyMVvUpK6GqKmHYZwqbzgU+8wCA=; b=F/panot2aB14gJ3HA9smB0TSYj5JHSLrtVaEpUch8I8u0mxeJMmgi4G6vjg1ACzX11 OIlkiMMDQ096dNlLhKRVd1FiASi+M47yVQ08chL1Er1RoQefCmYcDUN6huNMJ9aEpv9g ursgx68+m2kgx+sO7sEq4oVft3lONBuh64l73B6pD1mso9r1UBcjHr9BZhSeDzMcqbKv 1WUpKN1sn7Riv2w3WiKZzDE2/M+gtA2wOfi4I2uEhUrH6j0jGSJR9Q7PQWspOTfL6VVZ jETebX1TMXmr9GpJmqbo9hYxteiLuCQUmxB9MaWTw+FBnnS8/jMN0oWpXjCg196qGwSN +ZCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=nGPRm3jBy7CPW6XxWyMVvUpK6GqKmHYZwqbzgU+8wCA=; b=pNJTXrRP6QvDIQG0yTIUTkMxYTjFfVK9kD7Q0WcFwXb0f03fPcKvMS23x7TeoAFGY4 GbdWumupYf8iYL+B44dAL4lmU5WkaO9MytKnCj8+deEpHsBfbjI8S0/iAddNd+a4Xn/P hMBsInV5LszSu+N/jJkJwrgvlOzT4ouJ0mCIky7/i11psd7h+YavfHN1GgEnstaGo9xZ y8gvpkulI7eWdvHCiO4vNq26DAT8gjU37z7mKzPrmtEbBaQP2H5jA1G6p5PcWPhMgdEj 5+vap0Vwwwwvj+ahE5ngJZ0o7yhwqOqjN9pH36EgfSWEpIm3qk7La0XG4r9fWM7qcZY0 cMzQ== X-Gm-Message-State: AJIora/ZbxJZoeGF29PKPxvxrgd1ZAKL09cPNqWVCa93Z1SB6OJPo+Jg o6J+Ijx73pUQqQvtGDWIjqve0M2VvFph X-Google-Smtp-Source: AGRyM1vE5PBt5bqjUGGedkHJSxjiJqad8/x9HegirPfXCfDS7TP8QRUU+OgfTyjVheV3VqtZLgSKNeMB9a7y X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:1d88:b0:1ec:d5f6:f141 with SMTP id pf8-20020a17090b1d8800b001ecd5f6f141mr1568586pjb.119.1657152374352; Wed, 06 Jul 2022 17:06:14 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:24 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-7-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 06/18] mm/khugepaged: add flag to predicate khugepaged-only behavior From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152375; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nGPRm3jBy7CPW6XxWyMVvUpK6GqKmHYZwqbzgU+8wCA=; b=FerAmSQJr8CCTZrQXsRGrwwV9CxF7bKA50qSg3QBY7C5Gs/g+MY6j1ne6Nc0DfGDD1ViyC 9P2YD00oLPM5tJLhgQ5OTGLWQvaNNu9YlWjHvumF+L/LXbFSjKmpGARFmVz/b+m3Mywd4k c4Tv2v1weUN6xKmEEkTmWG5D7KArEjc= ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="F/panot2"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf31.hostedemail.com: domain of 3diPGYgcKCPEshdXXYXZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3diPGYgcKCPEshdXXYXZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--zokeefe.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152375; a=rsa-sha256; cv=none; b=RkeqtjAcrHAzuBq5Gf1OClMjhwCRX68LHfHqGTByu94HGUM2jDh687vo0JHsmD4YYn5mWB pJMm4doCY6PJqB5esAFpQolCaWwbOLFDBYEBAGwmMAxdhOLqJL6xGRBA+QdarDDsSo/EnZ Bx2YKDTPTj3fxQkkqclQ3kZofGe/BsE= X-Rspam-User: X-Rspamd-Server: rspam07 Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="F/panot2"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf31.hostedemail.com: domain of 3diPGYgcKCPEshdXXYXZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3diPGYgcKCPEshdXXYXZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--zokeefe.bounces.google.com X-Stat-Signature: nwkfmdjkhzpw8ms56szd1xeu5nr4kyrz X-Rspamd-Queue-Id: 3A3E620022 X-HE-Tag: 1657152375-571658 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add .is_khugepaged flag to struct collapse_control so khugepaged-specific behavior can be elided by MADV_COLLAPSE context. Start by protecting khugepaged-specific heuristics by this flag. In MADV_COLLAPSE, the user presumably has reason to believe the collapse will be beneficial and khugepaged heuristics shouldn't prevent the user from doing so: 1) sysfs-controlled knobs khugepaged_max_ptes_[none|swap|shared] 2) requirement that some pages in region being collapsed be young or referenced Signed-off-by: Zach O'Keefe Reviewed-by: Yang Shi --- v6 -> v7: There is no functional change here from v6, just a renaming of flags to explicitly be predicated on khugepaged. --- mm/khugepaged.c | 62 ++++++++++++++++++++++++++++++++++--------------- 1 file changed, 43 insertions(+), 19 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 147f5828f052..d89056d8cbad 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -73,6 +73,8 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait); * default collapse hugepages if there is at least one pte mapped like * it would have happened if the vma was large enough during page * fault. + * + * Note that these are only respected if collapse was initiated by khugepaged. */ static unsigned int khugepaged_max_ptes_none __read_mostly; static unsigned int khugepaged_max_ptes_swap __read_mostly; @@ -86,6 +88,8 @@ static struct kmem_cache *mm_slot_cache __read_mostly; #define MAX_PTE_MAPPED_THP 8 struct collapse_control { + bool is_khugepaged; + /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; @@ -554,6 +558,7 @@ static bool is_refcount_suitable(struct page *page) static int __collapse_huge_page_isolate(struct vm_area_struct *vma, unsigned long address, pte_t *pte, + struct collapse_control *cc, struct list_head *compound_pagelist) { struct page *page = NULL; @@ -567,7 +572,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (pte_none(pteval) || (pte_present(pteval) && is_zero_pfn(pte_pfn(pteval)))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !cc->is_khugepaged)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -587,8 +593,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, VM_BUG_ON_PAGE(!PageAnon(page), page); - if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + if (cc->is_khugepaged && page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -654,10 +660,14 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (PageCompound(page)) list_add_tail(&page->lru, compound_pagelist); next: - /* There should be enough young pte to collapse the page */ - if (pte_young(pteval) || - page_is_young(page) || PageReferenced(page) || - mmu_notifier_test_young(vma->vm_mm, address)) + /* + * If collapse was initiated by khugepaged, check that there is + * enough young pte to justify collapsing the page + */ + if (cc->is_khugepaged && + (pte_young(pteval) || page_is_young(page) || + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + address))) referenced++; if (pte_write(pteval)) @@ -666,7 +676,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (unlikely(!writable)) { result = SCAN_PAGE_RO; - } else if (unlikely(!referenced)) { + } else if (unlikely(cc->is_khugepaged && !referenced)) { result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; @@ -745,6 +755,7 @@ static void khugepaged_alloc_sleep(void) struct collapse_control khugepaged_collapse_control = { + .is_khugepaged = true, .last_target_node = NUMA_NO_NODE, }; @@ -1023,7 +1034,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, mmu_notifier_invalidate_range_end(&range); spin_lock(pte_ptl); - result = __collapse_huge_page_isolate(vma, address, pte, + result = __collapse_huge_page_isolate(vma, address, pte, cc, &compound_pagelist); spin_unlock(pte_ptl); @@ -1114,7 +1125,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, _pte++, _address += PAGE_SIZE) { pte_t pteval = *_pte; if (is_swap_pte(pteval)) { - if (++unmapped <= khugepaged_max_ptes_swap) { + if (++unmapped <= khugepaged_max_ptes_swap || + !cc->is_khugepaged) { /* * Always be strict with uffd-wp * enabled swap entries. Please see @@ -1133,7 +1145,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, } if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !cc->is_khugepaged)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -1163,8 +1176,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, goto out_unmap; } - if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + if (cc->is_khugepaged && + page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out_unmap; @@ -1218,14 +1232,22 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, result = SCAN_PAGE_COUNT; goto out_unmap; } - if (pte_young(pteval) || - page_is_young(page) || PageReferenced(page) || - mmu_notifier_test_young(vma->vm_mm, address)) + + /* + * If collapse was initiated by khugepaged, check that there is + * enough young pte to justify collapsing the page + */ + if (cc->is_khugepaged && + (pte_young(pteval) || page_is_young(page) || + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + address))) referenced++; } if (!writable) { result = SCAN_PAGE_RO; - } else if (!referenced || (unmapped && referenced < HPAGE_PMD_NR/2)) { + } else if (cc->is_khugepaged && + (!referenced || + (unmapped && referenced < HPAGE_PMD_NR / 2))) { result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; @@ -1894,7 +1916,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, continue; if (xa_is_value(page)) { - if (++swap > khugepaged_max_ptes_swap) { + if (cc->is_khugepaged && + ++swap > khugepaged_max_ptes_swap) { result = SCAN_EXCEED_SWAP_PTE; count_vm_event(THP_SCAN_EXCEED_SWAP_PTE); break; @@ -1945,7 +1968,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, rcu_read_unlock(); if (result == SCAN_SUCCEED) { - if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { + if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none && + cc->is_khugepaged) { result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else {