From patchwork Tue Apr 26 14:44:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12827321 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEA4AC433EF for ; Tue, 26 Apr 2022 14:44:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 290386B0074; Tue, 26 Apr 2022 10:44:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 23C126B0075; Tue, 26 Apr 2022 10:44:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0DE3F6B0078; Tue, 26 Apr 2022 10:44:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id F39F56B0074 for ; Tue, 26 Apr 2022 10:44:21 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D990F209B9 for ; Tue, 26 Apr 2022 14:44:21 +0000 (UTC) X-FDA: 79399300722.03.93DBF94 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf28.hostedemail.com (Postfix) with ESMTP id 553D8C0040 for ; Tue, 26 Apr 2022 14:44:14 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id b6-20020a170902d50600b0015d1eb8c82eso3386252plg.12 for ; Tue, 26 Apr 2022 07:44:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=hi3p1bugAHnuC2d3JuQCqWWHtkP9sgDKJQRPyIVI7KE=; b=kCvNGqwBkEAEEDmcKo215mC7wRcpGHSec0Tpf/dORsr2G5+j9tfdctkC21A18UMzBJ XdBtcyYTRO1mMiykbKlWqeOdRvwpKWjKZSJ38OH06axqX2LGXjC2JshVVn2b08IAfD3f xtwyuAKfgcuOtcx0gaMFbhu35D52CSb6xtXEfgpJVKRgcZwM5i+aIzOQzzmnL+lfnu5U ak2J2dirm5Df541VMj8T2U7bR7sY4mLseav2P6KNlEkWgOiWJXzXHhMkd88HrRkELgh6 IlmbF5ODU8QuD9ZRpyzYI2v0Dgt9nvGNOfgjgQZ22B/OVrs/KK0PkmJXZjTxPBMzQm0s UmvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=hi3p1bugAHnuC2d3JuQCqWWHtkP9sgDKJQRPyIVI7KE=; b=nvz1CahhYZru9UkfscMjEhc6hx+IJpkDEjYWufFmKoeRrxhsp6lreqve9uDsrvx8wj XgJeKT8i7pb2JMzTZefEFtsq2ySVFaeK362hqv4cumJmo20v0qkqEblBOM3MtLCbCr5P JcASsH9bWzlabBrYu0OfzxCx6zjAN5Qx2ZjdctudHgL5sPfjiyCQiqYpRFTtC8AVl4O7 mQOAFQEaU9bzFRfdcaz/8DYi/NMzsfbdhF+kRDJ3Hv10US4JIcMkca/DAXWgqI5qwu7b +qmqqRYRG3eEnVHoY+mP0tgLgkA9jkZWQvWciTHhInBjWVjbBUcYVrJpbRzwu2/5Gw9y JmtA== X-Gm-Message-State: AOAM530uy9eVVonZDUqIyht2QlmfrEwqfjrg/sU4l3PHeS3H+hrz/UPo a5iX1xU9Fp7alP4Iv0cgdtJc5Z39yAT+ X-Google-Smtp-Source: ABdhPJzcHWSwhv2BBQoTli5vJ9/64VfePtOw48ZK/S3rleh6iSrNKKR2TNqBTzhSjFNVUL5OeWlqAzlT6m9j X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:aa7:82d9:0:b0:4fa:2c7f:41e with SMTP id f25-20020aa782d9000000b004fa2c7f041emr25000519pfn.1.1650984260194; Tue, 26 Apr 2022 07:44:20 -0700 (PDT) Date: Tue, 26 Apr 2022 07:44:01 -0700 In-Reply-To: <20220426144412.742113-1-zokeefe@google.com> Message-Id: <20220426144412.742113-2-zokeefe@google.com> Mime-Version: 1.0 References: <20220426144412.742113-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v3 01/12] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds THP From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" , kernel test robot X-Stat-Signature: h4j6pnza4xqkpbqj1cku6espmpbx5hdm X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 553D8C0040 X-Rspam-User: Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kCvNGqwB; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of 3RAVoYgcKCAk8xtnnonpxxpun.lxvurw36-vvt4jlt.x0p@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3RAVoYgcKCAk8xtnnonpxxpun.lxvurw36-vvt4jlt.x0p@flex--zokeefe.bounces.google.com X-HE-Tag: 1650984254-798875 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When scanning an anon pmd to see if it's eligible for collapse, return SCAN_PMD_MAPPED if the pmd already maps a THP. Note that SCAN_PMD_MAPPED is different from SCAN_PAGE_COMPOUND used in the file-collapse path, since the latter might identify pte-mapped compound pages. This is required by MADV_COLLAPSE which necessarily needs to know what hugepage-aligned/sized regions are already pmd-mapped. Signed-off-by: Zach O'Keefe Reported-by: kernel test robot --- include/trace/events/huge_memory.h | 3 ++- mm/internal.h | 1 + mm/khugepaged.c | 30 ++++++++++++++++++++++++++---- mm/rmap.c | 15 +++++++++++++-- 4 files changed, 42 insertions(+), 7 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index d651f3437367..9faa678e0a5b 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -33,7 +33,8 @@ EM( SCAN_ALLOC_HUGE_PAGE_FAIL, "alloc_huge_page_failed") \ EM( SCAN_CGROUP_CHARGE_FAIL, "ccgroup_charge_failed") \ EM( SCAN_TRUNCATED, "truncated") \ - EMe(SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ + EM( SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ + EMe(SCAN_PMD_MAPPED, "page_pmd_mapped") \ #undef EM #undef EMe diff --git a/mm/internal.h b/mm/internal.h index 0667abd57634..51ae9f71a2a3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -172,6 +172,7 @@ extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason /* * in mm/rmap.c: */ +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address); extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); /* diff --git a/mm/khugepaged.c b/mm/khugepaged.c index ba8dbd1825da..2933b13fc975 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -51,6 +51,7 @@ enum scan_result { SCAN_CGROUP_CHARGE_FAIL, SCAN_TRUNCATED, SCAN_PAGE_HAS_PRIVATE, + SCAN_PMD_MAPPED, }; #define CREATE_TRACE_POINTS @@ -987,6 +988,29 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, return 0; } +static int find_pmd_or_thp_or_none(struct mm_struct *mm, + unsigned long address, + pmd_t **pmd) +{ + pmd_t pmde; + + *pmd = mm_find_pmd_raw(mm, address); + if (!*pmd) + return SCAN_PMD_NULL; + + pmde = pmd_read_atomic(*pmd); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ + barrier(); +#endif + if (!pmd_present(pmde) || pmd_none(pmde)) + return SCAN_PMD_NULL; + if (pmd_trans_huge(pmde)) + return SCAN_PMD_MAPPED; + return SCAN_SUCCEED; +} + /* * Bring missing pages in from swap, to complete THP collapse. * Only done if khugepaged_scan_pmd believes it is worthwhile. @@ -1238,11 +1262,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, VM_BUG_ON(address & ~HPAGE_PMD_MASK); - pmd = mm_find_pmd(mm, address); - if (!pmd) { - result = SCAN_PMD_NULL; + result = find_pmd_or_thp_or_none(mm, address, &pmd); + if (result != SCAN_SUCCEED) goto out; - } memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); pte = pte_offset_map_lock(mm, pmd, address, &ptl); diff --git a/mm/rmap.c b/mm/rmap.c index 61e63db5dc6f..49817f35e65c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -759,13 +759,12 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) return vma_address(page, vma); } -pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address) { pgd_t *pgd; p4d_t *p4d; pud_t *pud; pmd_t *pmd = NULL; - pmd_t pmde; pgd = pgd_offset(mm, address); if (!pgd_present(*pgd)) @@ -780,6 +779,18 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) goto out; pmd = pmd_offset(pud, address); +out: + return pmd; +} + +pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) +{ + pmd_t pmde; + pmd_t *pmd; + + pmd = mm_find_pmd_raw(mm, address); + if (!pmd) + goto out; /* * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at() * without holding anon_vma lock for write. So when looking for a From patchwork Tue Apr 26 14:44:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12827322 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A18C3C433F5 for ; Tue, 26 Apr 2022 14:44:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 06A8B6B0075; Tue, 26 Apr 2022 10:44:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 019BB6B0078; Tue, 26 Apr 2022 10:44:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E23586B007B; Tue, 26 Apr 2022 10:44:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id D1A0B6B0075 for ; Tue, 26 Apr 2022 10:44:23 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id ABA6F20A0C for ; Tue, 26 Apr 2022 14:44:23 +0000 (UTC) X-FDA: 79399300806.13.402E81D Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf31.hostedemail.com (Postfix) with ESMTP id 7B94E20050 for ; Tue, 26 Apr 2022 14:44:15 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id i10-20020a1709026aca00b00158f14b4f2fso11381372plt.2 for ; Tue, 26 Apr 2022 07:44:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=RkgpVzHYTxleXHDta1umOFxcilub9tPuQpXcMe9QSto=; b=DDtIGnCVDWCVJxgc6emWMofGyHp+wPB9J8kBMnIQ8I2vDsP+5mIScz9XLUwU+/CE2s aRGcBfFr8e5V0aIN5aqns5ZsJv8Ef3PALfxoLg2ckPtUPrEO63qHHcXh8SuEGmNt+mUH IFyhadqanTi1nFHFUE9C9pKuKgPMKfFIucXwMWSFbgrL/pARkdGuvy/pvYzXvLTGhuJB IDdjbAJRwrzQYYGhEYbKiBbF9pTkbGxdUNusYzWqijwfCkaiD9ja7wsKiKSgpeJsePha +30E8pAjrMx6QBkuyTMt7xKjtMfa/i3FElVYpfYIX02XTcbmoZOr+222+8M89pk5Mcm5 uB6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=RkgpVzHYTxleXHDta1umOFxcilub9tPuQpXcMe9QSto=; b=fsfztU7+L1J2idKmz5N6XdsAICPf2W1L3nlnqzUtN2TNwJLGLGTvEVaJID/uSgAnuL ky18lPQssDQTO7FxlWi/SxMQBaBEPh1F7RFDVw8CRxrdHEF4pLxBedNEu91WQayoiyY/ EyBd2BFmRUsTKVo6FJIolAQHfxm09/8oG+WVXKfPUBzjn7xZfCJqMlTy/3P08dRtyue5 TlLnT6F9cgmYSFuYC7dt5JUQ9OfmY4j2dj085rCHVcDK7kNyCWr9K3rTmkCc6XPwPOIo b0+pcStm1fsT4Cs1EEQP5qVAwO9nFKbXf4AspR+/X+/VTrjIBeRtHbMnE+lAuxO6dUEO 83/w== X-Gm-Message-State: AOAM530ePr4oZstFqMaX8JlXXGgGd2SkGiaTfSxS0Hl/wRhIqOxLHAfp uLchwpkYYfTxEfWbrn0RD5updbUtNhak X-Google-Smtp-Source: ABdhPJx1du1zCixNOyPPngZAN3Gt1U3spGeVBGWXxckj3M3huC2TMxsktMxJ7fHmwKqZBxNhnGw30XTwr166 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:2392:b0:4fa:dcd2:5bc1 with SMTP id f18-20020a056a00239200b004fadcd25bc1mr24838074pfc.8.1650984262248; Tue, 26 Apr 2022 07:44:22 -0700 (PDT) Date: Tue, 26 Apr 2022 07:44:02 -0700 In-Reply-To: <20220426144412.742113-1-zokeefe@google.com> Message-Id: <20220426144412.742113-3-zokeefe@google.com> Mime-Version: 1.0 References: <20220426144412.742113-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v3 02/12] mm/khugepaged: add struct collapse_control From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7B94E20050 X-Stat-Signature: k8zjjatxj14d4y369fapwzxams9z68u8 X-Rspam-User: Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=DDtIGnCV; spf=pass (imf31.hostedemail.com: domain of 3RgVoYgcKCAsAzvppqprzzrwp.nzxwty58-xxv6lnv.z2r@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3RgVoYgcKCAsAzvppqprzzrwp.nzxwty58-xxv6lnv.z2r@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1650984255-61088 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Modularize hugepage collapse by introducing struct collapse_control. This structure serves to describe the properties of the requested collapse, as well as serve as a local scratch pad to use during the collapse itself. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 79 ++++++++++++++++++++++++++++--------------------- 1 file changed, 46 insertions(+), 33 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 2933b13fc975..9d42fa330812 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -86,6 +86,14 @@ static struct kmem_cache *mm_slot_cache __read_mostly; #define MAX_PTE_MAPPED_THP 8 +struct collapse_control { + /* Num pages scanned per node */ + int node_load[MAX_NUMNODES]; + + /* Last target selected in khugepaged_find_target_node() for this scan */ + int last_target_node; +}; + /** * struct mm_slot - hash lookup from mm to mm_slot * @hash: hash collision list @@ -796,9 +804,7 @@ static void khugepaged_alloc_sleep(void) remove_wait_queue(&khugepaged_wait, &wait); } -static int khugepaged_node_load[MAX_NUMNODES]; - -static bool khugepaged_scan_abort(int nid) +static bool khugepaged_scan_abort(int nid, struct collapse_control *cc) { int i; @@ -810,11 +816,11 @@ static bool khugepaged_scan_abort(int nid) return false; /* If there is a count for this node already, it must be acceptable */ - if (khugepaged_node_load[nid]) + if (cc->node_load[nid]) return false; for (i = 0; i < MAX_NUMNODES; i++) { - if (!khugepaged_node_load[i]) + if (!cc->node_load[i]) continue; if (node_distance(nid, i) > node_reclaim_distance) return true; @@ -829,28 +835,28 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void) } #ifdef CONFIG_NUMA -static int khugepaged_find_target_node(void) +static int khugepaged_find_target_node(struct collapse_control *cc) { - static int last_khugepaged_target_node = NUMA_NO_NODE; int nid, target_node = 0, max_value = 0; /* find first node with max normal pages hit */ for (nid = 0; nid < MAX_NUMNODES; nid++) - if (khugepaged_node_load[nid] > max_value) { - max_value = khugepaged_node_load[nid]; + if (cc->node_load[nid] > max_value) { + max_value = cc->node_load[nid]; target_node = nid; } /* do some balance if several nodes have the same hit record */ - if (target_node <= last_khugepaged_target_node) - for (nid = last_khugepaged_target_node + 1; nid < MAX_NUMNODES; - nid++) - if (max_value == khugepaged_node_load[nid]) { + if (target_node <= cc->last_target_node) + for (nid = cc->last_target_node + 1; nid < MAX_NUMNODES; + nid++) { + if (max_value == cc->node_load[nid]) { target_node = nid; break; } + } - last_khugepaged_target_node = target_node; + cc->last_target_node = target_node; return target_node; } @@ -888,7 +894,7 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) return *hpage; } #else -static int khugepaged_find_target_node(void) +static int khugepaged_find_target_node(struct collapse_control *cc) { return 0; } @@ -1248,7 +1254,8 @@ static void collapse_huge_page(struct mm_struct *mm, static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, - struct page **hpage) + struct page **hpage, + struct collapse_control *cc) { pmd_t *pmd; pte_t *pte, *_pte; @@ -1266,7 +1273,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, if (result != SCAN_SUCCEED) goto out; - memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); + memset(cc->node_load, 0, sizeof(cc->node_load)); pte = pte_offset_map_lock(mm, pmd, address, &ptl); for (_address = address, _pte = pte; _pte < pte+HPAGE_PMD_NR; _pte++, _address += PAGE_SIZE) { @@ -1332,16 +1339,16 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, /* * Record which node the original page is from and save this - * information to khugepaged_node_load[]. + * information to cc->node_load[]. * Khugepaged will allocate hugepage from the node has the max * hit record. */ node = page_to_nid(page); - if (khugepaged_scan_abort(node)) { + if (khugepaged_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; goto out_unmap; } - khugepaged_node_load[node]++; + cc->node_load[node]++; if (!PageLRU(page)) { result = SCAN_PAGE_LRU; goto out_unmap; @@ -1392,7 +1399,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (ret) { - node = khugepaged_find_target_node(); + node = khugepaged_find_target_node(cc); /* collapse_huge_page will return with the mmap_lock released */ collapse_huge_page(mm, address, hpage, node, referenced, unmapped); @@ -2044,7 +2051,8 @@ static void collapse_file(struct mm_struct *mm, } static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage) + struct file *file, pgoff_t start, struct page **hpage, + struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -2055,7 +2063,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, present = 0; swap = 0; - memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); + memset(cc->node_load, 0, sizeof(cc->node_load)); rcu_read_lock(); xas_for_each(&xas, page, start + HPAGE_PMD_NR - 1) { if (xas_retry(&xas, page)) @@ -2080,11 +2088,11 @@ static void khugepaged_scan_file(struct mm_struct *mm, } node = page_to_nid(page); - if (khugepaged_scan_abort(node)) { + if (khugepaged_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; break; } - khugepaged_node_load[node]++; + cc->node_load[node]++; if (!PageLRU(page)) { result = SCAN_PAGE_LRU; @@ -2117,7 +2125,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - node = khugepaged_find_target_node(); + node = khugepaged_find_target_node(cc); collapse_file(mm, file, start, hpage, node); } } @@ -2126,7 +2134,8 @@ static void khugepaged_scan_file(struct mm_struct *mm, } #else static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage) + struct file *file, pgoff_t start, struct page **hpage, + struct collapse_control *cc) { BUILD_BUG(); } @@ -2137,7 +2146,8 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, - struct page **hpage) + struct page **hpage, + struct collapse_control *cc) __releases(&khugepaged_mm_lock) __acquires(&khugepaged_mm_lock) { @@ -2213,12 +2223,12 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, mmap_read_unlock(mm); ret = 1; - khugepaged_scan_file(mm, file, pgoff, hpage); + khugepaged_scan_file(mm, file, pgoff, hpage, cc); fput(file); } else { ret = khugepaged_scan_pmd(mm, vma, khugepaged_scan.address, - hpage); + hpage, cc); } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; @@ -2274,7 +2284,7 @@ static int khugepaged_wait_event(void) kthread_should_stop(); } -static void khugepaged_do_scan(void) +static void khugepaged_do_scan(struct collapse_control *cc) { struct page *hpage = NULL; unsigned int progress = 0, pass_through_head = 0; @@ -2298,7 +2308,7 @@ static void khugepaged_do_scan(void) if (khugepaged_has_work() && pass_through_head < 2) progress += khugepaged_scan_mm_slot(pages - progress, - &hpage); + &hpage, cc); else progress = pages; spin_unlock(&khugepaged_mm_lock); @@ -2337,12 +2347,15 @@ static void khugepaged_wait_work(void) static int khugepaged(void *none) { struct mm_slot *mm_slot; + struct collapse_control cc = { + .last_target_node = NUMA_NO_NODE, + }; set_freezable(); set_user_nice(current, MAX_NICE); while (!kthread_should_stop()) { - khugepaged_do_scan(); + khugepaged_do_scan(&cc); khugepaged_wait_work(); } From patchwork Tue Apr 26 14:44:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12827323 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF8BCC433FE for ; Tue, 26 Apr 2022 14:44:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 571DC6B0078; Tue, 26 Apr 2022 10:44:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5220A6B007B; Tue, 26 Apr 2022 10:44:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3748B6B007E; Tue, 26 Apr 2022 10:44:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 286CC6B0078 for ; Tue, 26 Apr 2022 10:44:26 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E8CCB26179 for ; Tue, 26 Apr 2022 14:44:25 +0000 (UTC) X-FDA: 79399300890.14.0D1DE79 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf18.hostedemail.com (Postfix) with ESMTP id 5E14D1C0046 for ; Tue, 26 Apr 2022 14:44:21 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id m11-20020a170902f64b00b0015820f8038fso11364295plg.23 for ; Tue, 26 Apr 2022 07:44:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=M8YwgnvFg79dAQIhQkM2qT0JBZZg/xIhQDaM5JoyqPg=; b=G8FfDxcnWK8olM/LcMd6QGr53pVueXJ/1aDMWp+fIi5hzn5RAGLeRm+d+oekN8q34u cfGGAc/G89jq2dFFahFbT768/glPrIdgKqp6ZkETDjwEq9d3p1pDj5mxjKuUz7bxvCLf jrCDOqWvOewyIMzq8AuVCNxYIccT4XGegRhJgwLTUOvjMKKjMHkz94LjicTlW62krhp/ KUqR721ZfYPp6BS2m4aEE3N/LshtwyowMbVWBuD4xBuKImKxDlVnoKNXCnrebXXNgZNv LqTee/Wt/r+zTMQljAeC8HU/i/L+y4fIWa2Xs9qrRXOZF9x+1Z8ec6YLfN5j0F/uLzFu 0Zdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=M8YwgnvFg79dAQIhQkM2qT0JBZZg/xIhQDaM5JoyqPg=; b=00RDe8v9CR4NiinoN+Y84ucQdFILzDoEV39nvXkr83ySLp2Ta0FCQb1Y35gqR7xej1 c+o5Tszms9EZpP/hdSc1uIUheLTUNDXgbJOtyQ9OBPCfHGyApOTGf4CZNYSXD89KzUCJ MyGc1A0EWKeiRI2IdlYipdaVynQC0098gPnqr/l9UVVJoW/XzeXT1ZNPAeQkGQEiCEND ijm3E85DxB/3i3mY6bfiW38TDgnD3pCmbxmdbUS/oVTBoX9XCetlXhMHvA71nMznfGON QhAPsbn7DGUj4vHoNqbjvHtoSCXULOT+ZjiJen26c3clOvDl7WMCIAYQ7G+Paf6hm9VN 0B3Q== X-Gm-Message-State: AOAM532OcEIPK3OQBpXxbBOSIe5Jxxxr7za0EkFAWeHj5xVrgG9Xwkow IwlcE3xjb+Od0WDKmmr037uOj6JsJyU4 X-Google-Smtp-Source: ABdhPJwq6WU9UtU+P+0GVuWo0TZkQoJdfaRPoj1tDyI8sLjHvrYqQAX9zVqqKEB0Z3pF640f+TS88HXveLc6 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:26ea:b0:50a:7b46:450b with SMTP id p42-20020a056a0026ea00b0050a7b46450bmr24776465pfw.51.1650984264448; Tue, 26 Apr 2022 07:44:24 -0700 (PDT) Date: Tue, 26 Apr 2022 07:44:03 -0700 In-Reply-To: <20220426144412.742113-1-zokeefe@google.com> Message-Id: <20220426144412.742113-4-zokeefe@google.com> Mime-Version: 1.0 References: <20220426144412.742113-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v3 03/12] mm/khugepaged: make hugepage allocation context-specific From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" , kernel test robot X-Stat-Signature: au316gr4z389hhjesjanht68f5mrymxg X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 5E14D1C0046 X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=G8FfDxcn; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of 3SAVoYgcKCA0C1xrrsrt11tyr.p1zyv07A-zzx8npx.14t@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3SAVoYgcKCA0C1xrrsrt11tyr.p1zyv07A-zzx8npx.14t@flex--zokeefe.bounces.google.com X-HE-Tag: 1650984261-601437 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add hugepage allocation context to struct collapse_context, allowing different collapse contexts to allocate hugepages differently. For example, khugepaged decides to allocate differently in NUMA and UMA configurations, and other collapse contexts shouldn't be coupled to this decision. Likewise for the gfp flags used for said allocation. Additionally, move [pre]allocated hugepage pointer into struct collapse_context. Signed-off-by: Zach O'Keefe Reported-by: kernel test robot --- mm/khugepaged.c | 102 ++++++++++++++++++++++++------------------------ 1 file changed, 52 insertions(+), 50 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 9d42fa330812..c4962191d6e1 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -92,6 +92,11 @@ struct collapse_control { /* Last target selected in khugepaged_find_target_node() for this scan */ int last_target_node; + + struct page *hpage; + gfp_t (*gfp)(void); + struct page* (*alloc_hpage)(struct collapse_control *cc, gfp_t gfp, + int node); }; /** @@ -877,21 +882,21 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static struct page * -khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static struct page *khugepaged_alloc_page(struct collapse_control *cc, + gfp_t gfp, int node) { - VM_BUG_ON_PAGE(*hpage, *hpage); + VM_BUG_ON_PAGE(cc->hpage, cc->hpage); - *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); - if (unlikely(!*hpage)) { + cc->hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); + if (unlikely(!cc->hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); - *hpage = ERR_PTR(-ENOMEM); + cc->hpage = ERR_PTR(-ENOMEM); return NULL; } - prep_transhuge_page(*hpage); + prep_transhuge_page(cc->hpage); count_vm_event(THP_COLLAPSE_ALLOC); - return *hpage; + return cc->hpage; } #else static int khugepaged_find_target_node(struct collapse_control *cc) @@ -953,12 +958,12 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static struct page * -khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static struct page *khugepaged_alloc_page(struct collapse_control *cc, + gfp_t gfp, int node) { - VM_BUG_ON(!*hpage); + VM_BUG_ON(!cc->hpage); - return *hpage; + return cc->hpage; } #endif @@ -1080,10 +1085,9 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, return true; } -static void collapse_huge_page(struct mm_struct *mm, - unsigned long address, - struct page **hpage, - int node, int referenced, int unmapped) +static void collapse_huge_page(struct mm_struct *mm, unsigned long address, + struct collapse_control *cc, int referenced, + int unmapped) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; @@ -1096,11 +1100,12 @@ static void collapse_huge_page(struct mm_struct *mm, struct mmu_notifier_range range; gfp_t gfp; const struct cpumask *cpumask; + int node; VM_BUG_ON(address & ~HPAGE_PMD_MASK); /* Only allocate from the target node */ - gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; + gfp = cc->gfp() | __GFP_THISNODE; /* * Before allocating the hugepage, release the mmap_lock read lock. @@ -1110,13 +1115,14 @@ static void collapse_huge_page(struct mm_struct *mm, */ mmap_read_unlock(mm); + node = khugepaged_find_target_node(cc); /* sched to specified node before huage page memory copy */ if (task_node(current) != node) { cpumask = cpumask_of_node(node); if (!cpumask_empty(cpumask)) set_cpus_allowed_ptr(current, cpumask); } - new_page = khugepaged_alloc_page(hpage, gfp, node); + new_page = cc->alloc_hpage(cc, gfp, node); if (!new_page) { result = SCAN_ALLOC_HUGE_PAGE_FAIL; goto out_nolock; @@ -1238,15 +1244,15 @@ static void collapse_huge_page(struct mm_struct *mm, update_mmu_cache_pmd(vma, address, pmd); spin_unlock(pmd_ptl); - *hpage = NULL; + cc->hpage = NULL; khugepaged_pages_collapsed++; result = SCAN_SUCCEED; out_up_write: mmap_write_unlock(mm); out_nolock: - if (!IS_ERR_OR_NULL(*hpage)) - mem_cgroup_uncharge(page_folio(*hpage)); + if (!IS_ERR_OR_NULL(cc->hpage)) + mem_cgroup_uncharge(page_folio(cc->hpage)); trace_mm_collapse_huge_page(mm, isolated, result); return; } @@ -1254,7 +1260,6 @@ static void collapse_huge_page(struct mm_struct *mm, static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, - struct page **hpage, struct collapse_control *cc) { pmd_t *pmd; @@ -1399,10 +1404,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (ret) { - node = khugepaged_find_target_node(cc); /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, hpage, node, - referenced, unmapped); + collapse_huge_page(mm, address, cc, referenced, unmapped); } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, @@ -1667,8 +1670,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * @mm: process address space where collapse happens * @file: file that collapse on * @start: collapse start address - * @hpage: new allocated huge page for collapse - * @node: appointed node the new huge page allocate from + * @cc: collapse context and scratchpad * * Basic scheme is simple, details are more complex: * - allocate and lock a new huge page; @@ -1686,8 +1688,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * + unlock and free huge page; */ static void collapse_file(struct mm_struct *mm, - struct file *file, pgoff_t start, - struct page **hpage, int node) + struct file *file, pgoff_t start, + struct collapse_control *cc) { struct address_space *mapping = file->f_mapping; gfp_t gfp; @@ -1697,15 +1699,16 @@ static void collapse_file(struct mm_struct *mm, XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER); int nr_none = 0, result = SCAN_SUCCEED; bool is_shmem = shmem_file(file); - int nr; + int nr, node; VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); /* Only allocate from the target node */ - gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; + gfp = cc->gfp() | __GFP_THISNODE; + node = khugepaged_find_target_node(cc); - new_page = khugepaged_alloc_page(hpage, gfp, node); + new_page = cc->alloc_hpage(cc, gfp, node); if (!new_page) { result = SCAN_ALLOC_HUGE_PAGE_FAIL; goto out; @@ -1998,7 +2001,7 @@ static void collapse_file(struct mm_struct *mm, * Remove pte page tables, so we can re-fault the page as huge. */ retract_page_tables(mapping, start); - *hpage = NULL; + cc->hpage = NULL; khugepaged_pages_collapsed++; } else { @@ -2045,14 +2048,14 @@ static void collapse_file(struct mm_struct *mm, unlock_page(new_page); out: VM_BUG_ON(!list_empty(&pagelist)); - if (!IS_ERR_OR_NULL(*hpage)) - mem_cgroup_uncharge(page_folio(*hpage)); + if (!IS_ERR_OR_NULL(cc->hpage)) + mem_cgroup_uncharge(page_folio(cc->hpage)); /* TODO: tracepoints */ } static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage, - struct collapse_control *cc) + struct file *file, pgoff_t start, + struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -2125,8 +2128,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - node = khugepaged_find_target_node(cc); - collapse_file(mm, file, start, hpage, node); + collapse_file(mm, file, start, cc); } } @@ -2134,8 +2136,8 @@ static void khugepaged_scan_file(struct mm_struct *mm, } #else static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage, - struct collapse_control *cc) + struct file *file, pgoff_t start, + struct collapse_control *cc) { BUILD_BUG(); } @@ -2146,7 +2148,6 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, - struct page **hpage, struct collapse_control *cc) __releases(&khugepaged_mm_lock) __acquires(&khugepaged_mm_lock) @@ -2223,12 +2224,11 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, mmap_read_unlock(mm); ret = 1; - khugepaged_scan_file(mm, file, pgoff, hpage, cc); + khugepaged_scan_file(mm, file, pgoff, cc); fput(file); } else { ret = khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, - hpage, cc); + khugepaged_scan.address, cc); } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; @@ -2286,15 +2286,15 @@ static int khugepaged_wait_event(void) static void khugepaged_do_scan(struct collapse_control *cc) { - struct page *hpage = NULL; unsigned int progress = 0, pass_through_head = 0; unsigned int pages = READ_ONCE(khugepaged_pages_to_scan); bool wait = true; + cc->hpage = NULL; lru_add_drain_all(); while (progress < pages) { - if (!khugepaged_prealloc_page(&hpage, &wait)) + if (!khugepaged_prealloc_page(&cc->hpage, &wait)) break; cond_resched(); @@ -2308,14 +2308,14 @@ static void khugepaged_do_scan(struct collapse_control *cc) if (khugepaged_has_work() && pass_through_head < 2) progress += khugepaged_scan_mm_slot(pages - progress, - &hpage, cc); + cc); else progress = pages; spin_unlock(&khugepaged_mm_lock); } - if (!IS_ERR_OR_NULL(hpage)) - put_page(hpage); + if (!IS_ERR_OR_NULL(cc->hpage)) + put_page(cc->hpage); } static bool khugepaged_should_wakeup(void) @@ -2349,6 +2349,8 @@ static int khugepaged(void *none) struct mm_slot *mm_slot; struct collapse_control cc = { .last_target_node = NUMA_NO_NODE, + .gfp = &alloc_hugepage_khugepaged_gfpmask, + .alloc_hpage = &khugepaged_alloc_page, }; set_freezable(); From patchwork Tue Apr 26 14:44:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12827324 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DAA0C433EF for ; Tue, 26 Apr 2022 14:44:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB4546B007B; Tue, 26 Apr 2022 10:44:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B3FA76B007E; Tue, 26 Apr 2022 10:44:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9914D6B0080; Tue, 26 Apr 2022 10:44:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 899616B007B for ; Tue, 26 Apr 2022 10:44:28 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6A22F2617A for ; Tue, 26 Apr 2022 14:44:28 +0000 (UTC) X-FDA: 79399301016.24.5EBC7A9 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf20.hostedemail.com (Postfix) with ESMTP id EAEB91C0058 for ; Tue, 26 Apr 2022 14:44:24 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id gg5-20020a17090b0a0500b001d9852bd129so1262024pjb.9 for ; Tue, 26 Apr 2022 07:44:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=0IayfJjoZr/tzEQHcKJQobzsP/jBT1S1lgyawPhZoKw=; b=KHrGHVUn0YmJoWuzh8KtyJr2Caa1APeXmYmvvWB7sURz1Idd4VntScY2ZM3HTSiBBg U6bFxqDQM8YeawhQHAZXygqHG3Fq9QWpIfOOB+0GfSENzCm0nnycvOR7ve7lGGgys7DQ rMdlusEcQHfcSyX4rtOnZKU0WS8V09U4YgMBDmQ/6TrRu1GYgQzUkSI/cdJ/whN75zg4 jjLEiY/JeOdU3PpGRrrX0AqT7kCcbj+Nza+GtYihSE7gSCwJxeuhA0qZkgfCyWlISs0m nK5YwLhOSp3uMwZaAS0qBw1fsTxQwbfqj5A+ZE+yyl2vWK/OLegkbavO+NbxN1zplaRT vV4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=0IayfJjoZr/tzEQHcKJQobzsP/jBT1S1lgyawPhZoKw=; b=mVi35bHn/f3LZxzxAm9bhQwDEZ6ikFcfc3nBQETGG0DdnZXvd/FyYYIEM3EQl1n0Y8 fWkShVTbWqN0RpQr9r8V5Y649n2JNT7gF24D5s1uCfBHbUzjWt31vyI9EvEZiqWAI+iR yrhziF2F9n8r8O9ZT3LN9ut+c0QXajJ4MVgnFMNaBL2S/3rROKFIZds7nXodIubGBKc6 O0JhWplGFSK9sRRj7PN/qx2rday7dA5Vpi+Fe/INgldyud317PXFO9cL23jslW469R6w ++rPXhQJ1oen6Z5yeZXW6YAdrTzmlwRLlHiGXZrkhAknaVUM80FLVpEQ/kCPOXK5ogBk FW+A== X-Gm-Message-State: AOAM5320HbtdG4Ex9WGsjqEbQbRByeGnWY/6mrfng68oeSvZtJzjBFNP 3CD4Uywm3eNRpSlEHulHPCv6hG7/PeFF X-Google-Smtp-Source: ABdhPJxy2y5mu+h/lfk5LBlE6VPowfbWcLRyvidlViZDYkJnI/0YaB1gHkY5qrxPMC3AVTgOmfft0JIZBgsW X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:d2c7:b0:15a:218a:432c with SMTP id n7-20020a170902d2c700b0015a218a432cmr23543251plc.20.1650984266650; Tue, 26 Apr 2022 07:44:26 -0700 (PDT) Date: Tue, 26 Apr 2022 07:44:04 -0700 In-Reply-To: <20220426144412.742113-1-zokeefe@google.com> Message-Id: <20220426144412.742113-5-zokeefe@google.com> Mime-Version: 1.0 References: <20220426144412.742113-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v3 04/12] mm/khugepaged: add struct collapse_result From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: iomw8ddc9oxzw1mjmp3qjq5d16h4ynhx X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: EAEB91C0058 X-Rspam-User: Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=KHrGHVUn; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of 3SgVoYgcKCA8E3zttutv33v0t.r310x29C-11zAprz.36v@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3SgVoYgcKCA8E3zttutv33v0t.r310x29C-11zAprz.36v@flex--zokeefe.bounces.google.com X-HE-Tag: 1650984264-977177 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add struct collapse_result which aggregates data from a single khugepaged_scan_pmd() or khugapaged_scan_file() request. Change khugepaged to take action based on this returned data instead of deep within the collapsing functions themselves. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 187 ++++++++++++++++++++++++++---------------------- 1 file changed, 101 insertions(+), 86 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c4962191d6e1..0e4f5fbe00d2 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -99,6 +99,14 @@ struct collapse_control { int node); }; +/* Gather information from one khugepaged_scan_[pmd|file]() request */ +struct collapse_result { + enum scan_result result; + + /* Was mmap_lock dropped during request? */ + bool dropped_mmap_lock; +}; + /** * struct mm_slot - hash lookup from mm to mm_slot * @hash: hash collision list @@ -743,13 +751,13 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_SUCCEED; trace_mm_collapse_huge_page_isolate(page, none_or_zero, referenced, writable, result); - return 1; + return SCAN_SUCCEED; } out: release_pte_pages(pte, _pte, compound_pagelist); trace_mm_collapse_huge_page_isolate(page, none_or_zero, referenced, writable, result); - return 0; + return result; } static void __collapse_huge_page_copy(pte_t *pte, struct page *page, @@ -1087,7 +1095,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, static void collapse_huge_page(struct mm_struct *mm, unsigned long address, struct collapse_control *cc, int referenced, - int unmapped) + int unmapped, struct collapse_result *cr) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; @@ -1095,7 +1103,6 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, pgtable_t pgtable; struct page *new_page; spinlock_t *pmd_ptl, *pte_ptl; - int isolated = 0, result = 0; struct vm_area_struct *vma; struct mmu_notifier_range range; gfp_t gfp; @@ -1103,6 +1110,7 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, int node; VM_BUG_ON(address & ~HPAGE_PMD_MASK); + cr->result = SCAN_FAIL; /* Only allocate from the target node */ gfp = cc->gfp() | __GFP_THISNODE; @@ -1114,6 +1122,7 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, * that. We will recheck the vma after taking it again in write mode. */ mmap_read_unlock(mm); + cr->dropped_mmap_lock = true; node = khugepaged_find_target_node(cc); /* sched to specified node before huage page memory copy */ @@ -1124,26 +1133,26 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, } new_page = cc->alloc_hpage(cc, gfp, node); if (!new_page) { - result = SCAN_ALLOC_HUGE_PAGE_FAIL; + cr->result = SCAN_ALLOC_HUGE_PAGE_FAIL; goto out_nolock; } if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { - result = SCAN_CGROUP_CHARGE_FAIL; + cr->result = SCAN_CGROUP_CHARGE_FAIL; goto out_nolock; } count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); mmap_read_lock(mm); - result = hugepage_vma_revalidate(mm, address, &vma); - if (result) { + cr->result = hugepage_vma_revalidate(mm, address, &vma); + if (cr->result) { mmap_read_unlock(mm); goto out_nolock; } pmd = mm_find_pmd(mm, address); if (!pmd) { - result = SCAN_PMD_NULL; + cr->result = SCAN_PMD_NULL; mmap_read_unlock(mm); goto out_nolock; } @@ -1166,8 +1175,8 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, * handled by the anon_vma lock + PG_lock. */ mmap_write_lock(mm); - result = hugepage_vma_revalidate(mm, address, &vma); - if (result) + cr->result = hugepage_vma_revalidate(mm, address, &vma); + if (cr->result) goto out_up_write; /* check if the pmd is still valid */ if (mm_find_pmd(mm, address) != pmd) @@ -1194,11 +1203,11 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, mmu_notifier_invalidate_range_end(&range); spin_lock(pte_ptl); - isolated = __collapse_huge_page_isolate(vma, address, pte, - &compound_pagelist); + cr->result = __collapse_huge_page_isolate(vma, address, pte, + &compound_pagelist); spin_unlock(pte_ptl); - if (unlikely(!isolated)) { + if (unlikely(cr->result != SCAN_SUCCEED)) { pte_unmap(pte); spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); @@ -1210,7 +1219,7 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, pmd_populate(mm, pmd, pmd_pgtable(_pmd)); spin_unlock(pmd_ptl); anon_vma_unlock_write(vma->anon_vma); - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto out_up_write; } @@ -1246,25 +1255,25 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, cc->hpage = NULL; - khugepaged_pages_collapsed++; - result = SCAN_SUCCEED; + cr->result = SCAN_SUCCEED; out_up_write: mmap_write_unlock(mm); out_nolock: if (!IS_ERR_OR_NULL(cc->hpage)) mem_cgroup_uncharge(page_folio(cc->hpage)); - trace_mm_collapse_huge_page(mm, isolated, result); + trace_mm_collapse_huge_page(mm, cr->result == SCAN_SUCCEED, cr->result); return; } -static int khugepaged_scan_pmd(struct mm_struct *mm, - struct vm_area_struct *vma, - unsigned long address, - struct collapse_control *cc) +static void khugepaged_scan_pmd(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, + struct collapse_control *cc, + struct collapse_result *cr) { pmd_t *pmd; pte_t *pte, *_pte; - int ret = 0, result = 0, referenced = 0; + int referenced = 0; int none_or_zero = 0, shared = 0; struct page *page = NULL; unsigned long _address; @@ -1273,9 +1282,10 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, bool writable = false; VM_BUG_ON(address & ~HPAGE_PMD_MASK); + cr->result = SCAN_FAIL; - result = find_pmd_or_thp_or_none(mm, address, &pmd); - if (result != SCAN_SUCCEED) + cr->result = find_pmd_or_thp_or_none(mm, address, &pmd); + if (cr->result != SCAN_SUCCEED) goto out; memset(cc->node_load, 0, sizeof(cc->node_load)); @@ -1291,12 +1301,12 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, * comment below for pte_uffd_wp(). */ if (pte_swp_uffd_wp(pteval)) { - result = SCAN_PTE_UFFD_WP; + cr->result = SCAN_PTE_UFFD_WP; goto out_unmap; } continue; } else { - result = SCAN_EXCEED_SWAP_PTE; + cr->result = SCAN_EXCEED_SWAP_PTE; count_vm_event(THP_SCAN_EXCEED_SWAP_PTE); goto out_unmap; } @@ -1306,7 +1316,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, ++none_or_zero <= khugepaged_max_ptes_none) { continue; } else { - result = SCAN_EXCEED_NONE_PTE; + cr->result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); goto out_unmap; } @@ -1321,7 +1331,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, * userfault messages that falls outside of * the registered range. So, just be simple. */ - result = SCAN_PTE_UFFD_WP; + cr->result = SCAN_PTE_UFFD_WP; goto out_unmap; } if (pte_write(pteval)) @@ -1329,13 +1339,13 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, page = vm_normal_page(vma, _address, pteval); if (unlikely(!page)) { - result = SCAN_PAGE_NULL; + cr->result = SCAN_PAGE_NULL; goto out_unmap; } if (page_mapcount(page) > 1 && ++shared > khugepaged_max_ptes_shared) { - result = SCAN_EXCEED_SHARED_PTE; + cr->result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out_unmap; } @@ -1350,20 +1360,20 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, */ node = page_to_nid(page); if (khugepaged_scan_abort(node, cc)) { - result = SCAN_SCAN_ABORT; + cr->result = SCAN_SCAN_ABORT; goto out_unmap; } cc->node_load[node]++; if (!PageLRU(page)) { - result = SCAN_PAGE_LRU; + cr->result = SCAN_PAGE_LRU; goto out_unmap; } if (PageLocked(page)) { - result = SCAN_PAGE_LOCK; + cr->result = SCAN_PAGE_LOCK; goto out_unmap; } if (!PageAnon(page)) { - result = SCAN_PAGE_ANON; + cr->result = SCAN_PAGE_ANON; goto out_unmap; } @@ -1385,7 +1395,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, * will be done again later the risk seems low. */ if (!is_refcount_suitable(page)) { - result = SCAN_PAGE_COUNT; + cr->result = SCAN_PAGE_COUNT; goto out_unmap; } if (pte_young(pteval) || @@ -1394,23 +1404,20 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, referenced++; } if (!writable) { - result = SCAN_PAGE_RO; + cr->result = SCAN_PAGE_RO; } else if (!referenced || (unmapped && referenced < HPAGE_PMD_NR/2)) { - result = SCAN_LACK_REFERENCED_PAGE; + cr->result = SCAN_LACK_REFERENCED_PAGE; } else { - result = SCAN_SUCCEED; - ret = 1; + cr->result = SCAN_SUCCEED; } out_unmap: pte_unmap_unlock(pte, ptl); - if (ret) { + if (cr->result == SCAN_SUCCEED) /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, cc, referenced, unmapped); - } + collapse_huge_page(mm, address, cc, referenced, unmapped, cr); out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, - none_or_zero, result, unmapped); - return ret; + none_or_zero, cr->result, unmapped); } static void collect_mm_slot(struct mm_slot *mm_slot) @@ -1671,6 +1678,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * @file: file that collapse on * @start: collapse start address * @cc: collapse context and scratchpad + * @cr: aggregate result information of collapse * * Basic scheme is simple, details are more complex: * - allocate and lock a new huge page; @@ -1689,7 +1697,9 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) */ static void collapse_file(struct mm_struct *mm, struct file *file, pgoff_t start, - struct collapse_control *cc) + struct collapse_control *cc, + struct collapse_result *cr) + { struct address_space *mapping = file->f_mapping; gfp_t gfp; @@ -1697,25 +1707,27 @@ static void collapse_file(struct mm_struct *mm, pgoff_t index, end = start + HPAGE_PMD_NR; LIST_HEAD(pagelist); XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER); - int nr_none = 0, result = SCAN_SUCCEED; + int nr_none = 0; bool is_shmem = shmem_file(file); int nr, node; VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); + cr->result = SCAN_SUCCEED; + /* Only allocate from the target node */ gfp = cc->gfp() | __GFP_THISNODE; node = khugepaged_find_target_node(cc); new_page = cc->alloc_hpage(cc, gfp, node); if (!new_page) { - result = SCAN_ALLOC_HUGE_PAGE_FAIL; + cr->result = SCAN_ALLOC_HUGE_PAGE_FAIL; goto out; } if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { - result = SCAN_CGROUP_CHARGE_FAIL; + cr->result = SCAN_CGROUP_CHARGE_FAIL; goto out; } count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); @@ -1731,7 +1743,7 @@ static void collapse_file(struct mm_struct *mm, break; xas_unlock_irq(&xas); if (!xas_nomem(&xas, GFP_KERNEL)) { - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto out; } } while (1); @@ -1762,13 +1774,13 @@ static void collapse_file(struct mm_struct *mm, */ if (index == start) { if (!xas_next_entry(&xas, end - 1)) { - result = SCAN_TRUNCATED; + cr->result = SCAN_TRUNCATED; goto xa_locked; } xas_set(&xas, index); } if (!shmem_charge(mapping->host, 1)) { - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto xa_locked; } xas_store(&xas, new_page); @@ -1781,14 +1793,14 @@ static void collapse_file(struct mm_struct *mm, /* swap in or instantiate fallocated page */ if (shmem_getpage(mapping->host, index, &page, SGP_NOALLOC)) { - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto xa_unlocked; } } else if (trylock_page(page)) { get_page(page); xas_unlock_irq(&xas); } else { - result = SCAN_PAGE_LOCK; + cr->result = SCAN_PAGE_LOCK; goto xa_locked; } } else { /* !is_shmem */ @@ -1801,7 +1813,7 @@ static void collapse_file(struct mm_struct *mm, lru_add_drain(); page = find_lock_page(mapping, index); if (unlikely(page == NULL)) { - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto xa_unlocked; } } else if (PageDirty(page)) { @@ -1820,17 +1832,17 @@ static void collapse_file(struct mm_struct *mm, */ xas_unlock_irq(&xas); filemap_flush(mapping); - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto xa_unlocked; } else if (PageWriteback(page)) { xas_unlock_irq(&xas); - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto xa_unlocked; } else if (trylock_page(page)) { get_page(page); xas_unlock_irq(&xas); } else { - result = SCAN_PAGE_LOCK; + cr->result = SCAN_PAGE_LOCK; goto xa_locked; } } @@ -1843,7 +1855,7 @@ static void collapse_file(struct mm_struct *mm, /* make sure the page is up to date */ if (unlikely(!PageUptodate(page))) { - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto out_unlock; } @@ -1852,12 +1864,12 @@ static void collapse_file(struct mm_struct *mm, * we locked the first page, then a THP might be there already. */ if (PageTransCompound(page)) { - result = SCAN_PAGE_COMPOUND; + cr->result = SCAN_PAGE_COMPOUND; goto out_unlock; } if (page_mapping(page) != mapping) { - result = SCAN_TRUNCATED; + cr->result = SCAN_TRUNCATED; goto out_unlock; } @@ -1868,18 +1880,18 @@ static void collapse_file(struct mm_struct *mm, * page is dirty because it hasn't been flushed * since first write. */ - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto out_unlock; } if (isolate_lru_page(page)) { - result = SCAN_DEL_PAGE_LRU; + cr->result = SCAN_DEL_PAGE_LRU; goto out_unlock; } if (page_has_private(page) && !try_to_release_page(page, GFP_KERNEL)) { - result = SCAN_PAGE_HAS_PRIVATE; + cr->result = SCAN_PAGE_HAS_PRIVATE; putback_lru_page(page); goto out_unlock; } @@ -1900,7 +1912,7 @@ static void collapse_file(struct mm_struct *mm, * - one from isolate_lru_page; */ if (!page_ref_freeze(page, 3)) { - result = SCAN_PAGE_COUNT; + cr->result = SCAN_PAGE_COUNT; xas_unlock_irq(&xas); putback_lru_page(page); goto out_unlock; @@ -1935,7 +1947,7 @@ static void collapse_file(struct mm_struct *mm, */ smp_mb(); if (inode_is_open_for_write(mapping->host)) { - result = SCAN_FAIL; + cr->result = SCAN_FAIL; __mod_lruvec_page_state(new_page, NR_FILE_THPS, -nr); filemap_nr_thps_dec(mapping); goto xa_locked; @@ -1962,7 +1974,7 @@ static void collapse_file(struct mm_struct *mm, */ try_to_unmap_flush(); - if (result == SCAN_SUCCEED) { + if (cr->result == SCAN_SUCCEED) { struct page *page, *tmp; /* @@ -2002,8 +2014,6 @@ static void collapse_file(struct mm_struct *mm, */ retract_page_tables(mapping, start); cc->hpage = NULL; - - khugepaged_pages_collapsed++; } else { struct page *page; @@ -2055,15 +2065,16 @@ static void collapse_file(struct mm_struct *mm, static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, pgoff_t start, - struct collapse_control *cc) + struct collapse_control *cc, + struct collapse_result *cr) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; XA_STATE(xas, &mapping->i_pages, start); int present, swap; int node = NUMA_NO_NODE; - int result = SCAN_SUCCEED; + cr->result = SCAN_SUCCEED; present = 0; swap = 0; memset(cc->node_load, 0, sizeof(cc->node_load)); @@ -2074,7 +2085,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, if (xa_is_value(page)) { if (++swap > khugepaged_max_ptes_swap) { - result = SCAN_EXCEED_SWAP_PTE; + cr->result = SCAN_EXCEED_SWAP_PTE; count_vm_event(THP_SCAN_EXCEED_SWAP_PTE); break; } @@ -2086,25 +2097,25 @@ static void khugepaged_scan_file(struct mm_struct *mm, * into a PMD sized page */ if (PageTransCompound(page)) { - result = SCAN_PAGE_COMPOUND; + cr->result = SCAN_PAGE_COMPOUND; break; } node = page_to_nid(page); if (khugepaged_scan_abort(node, cc)) { - result = SCAN_SCAN_ABORT; + cr->result = SCAN_SCAN_ABORT; break; } cc->node_load[node]++; if (!PageLRU(page)) { - result = SCAN_PAGE_LRU; + cr->result = SCAN_PAGE_LRU; break; } if (page_count(page) != 1 + page_mapcount(page) + page_has_private(page)) { - result = SCAN_PAGE_COUNT; + cr->result = SCAN_PAGE_COUNT; break; } @@ -2123,12 +2134,12 @@ static void khugepaged_scan_file(struct mm_struct *mm, } rcu_read_unlock(); - if (result == SCAN_SUCCEED) { + if (cr->result == SCAN_SUCCEED) { if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { - result = SCAN_EXCEED_NONE_PTE; + cr->result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - collapse_file(mm, file, start, cc); + collapse_file(mm, file, start, cc, cr); } } @@ -2137,7 +2148,8 @@ static void khugepaged_scan_file(struct mm_struct *mm, #else static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, pgoff_t start, - struct collapse_control *cc) + struct collapse_control *cc, + struct collapse_result *cr) { BUILD_BUG(); } @@ -2209,7 +2221,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, goto skip; while (khugepaged_scan.address < hend) { - int ret; + struct collapse_result cr = {0}; cond_resched(); if (unlikely(khugepaged_test_exit(mm))) goto breakouterloop; @@ -2223,17 +2235,20 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, khugepaged_scan.address); mmap_read_unlock(mm); - ret = 1; - khugepaged_scan_file(mm, file, pgoff, cc); + cr.dropped_mmap_lock = true; + khugepaged_scan_file(mm, file, pgoff, cc, &cr); fput(file); } else { - ret = khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, cc); + khugepaged_scan_pmd(mm, vma, + khugepaged_scan.address, + cc, &cr); } + if (cr.result == SCAN_SUCCEED) + ++khugepaged_pages_collapsed; /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; progress += HPAGE_PMD_NR; - if (ret) + if (cr.dropped_mmap_lock) /* we released mmap_lock so break loop */ goto breakouterloop_mmap_lock; if (progress >= pages) From patchwork Tue Apr 26 14:44:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12827325 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49329C433FE for ; Tue, 26 Apr 2022 14:44:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D3CD96B007E; Tue, 26 Apr 2022 10:44:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CC3366B0080; Tue, 26 Apr 2022 10:44:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B8AF16B0081; Tue, 26 Apr 2022 10:44:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id AA9786B007E for ; Tue, 26 Apr 2022 10:44:30 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7E2D726130 for ; Tue, 26 Apr 2022 14:44:30 +0000 (UTC) X-FDA: 79399301100.15.9629C7A Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf13.hostedemail.com (Postfix) with ESMTP id B1A1A20051 for ; Tue, 26 Apr 2022 14:44:23 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id m10-20020a17090a2c0a00b001d6a55788cdso1657727pjd.5 for ; Tue, 26 Apr 2022 07:44:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=nf8cal5u0kslacNoEYSjuLN6FOfDmbMZiiiuarpzUkE=; b=Kd3xfPNtzQBAVdApMes5WR2FrEsS5N8hL6qqiosPGxV8cLAtEEuUzf+Of9TwcbWDLO shjuzQzpLNtP3slHxLduIj7Y8rcsAeBeeajyOY+BLvhttoeiRgp+Z4iukMNxSyQG6UlG V20OYuiXXJxo1wAalPpTv8tjec8U0m+MmvwCp5hiYhZgeFO0HHbDl1p7VuYVIKGsggi1 SLzoc6pyBjfQYpccHEr81I6TDyDO105U10iYi25rRRkueo4huQ8XWPm0M+0PmYZ84YOq pUSEb21MWPpT8EpZBV+GpbFzVWBkDX+7vhqe71m/SCCYgVolVEtdOPey+Yov/lCy8XrO lcGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=nf8cal5u0kslacNoEYSjuLN6FOfDmbMZiiiuarpzUkE=; b=QfwyLLmC8FebXanFWWW3EVPgYQUL2C2wV6rbMOgg1jo2nJauxGBDRGdCI/d7MYCtnY eX82oqcRaVTTZu1sbfVTeHo2A+JcS7DxQQm2omwq8FRsiIoK/jpmAMdMH3GvlnV8xVlY nX4C2k8Nj288QNUbXRchurHhwX4hNPfBrV9Dn9SrC4chA4QuUf89Yu719Z4XQz303xMb 8y9o8oGc8E7ugk3ooq+n4k82Uppjic98Ay/BMdgQcOSaL7LM0wI7xwXvGAu32pwTexwC kw6LXISla8+IdTIptfBYFe3cqT00CcUk1U9i6wRRKfNSjOYnNHnpIwO3qfqW3PXL0B+w VL4g== X-Gm-Message-State: AOAM531Ys/iMUZXZcJqjGDVMdNfvJfiUQ7dpEXEiMzC6I6BGYuIb0xjg nihdUClulVDoYW7sF2l5V2gBG2qqzeD6 X-Google-Smtp-Source: ABdhPJw2M+iD3/zByJbRvtY9OZ/pLAAMYbLLUkR7tUuT55MWIe+aMN/ZRNX4bCOfoPKeNNtFqG9SzD7tw3hp X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:10cc:b0:505:ada6:e03e with SMTP id d12-20020a056a0010cc00b00505ada6e03emr24635808pfu.45.1650984268800; Tue, 26 Apr 2022 07:44:28 -0700 (PDT) Date: Tue, 26 Apr 2022 07:44:05 -0700 In-Reply-To: <20220426144412.742113-1-zokeefe@google.com> Message-Id: <20220426144412.742113-6-zokeefe@google.com> Mime-Version: 1.0 References: <20220426144412.742113-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v3 05/12] mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" , kernel test robot Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Kd3xfPNt; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of 3TAVoYgcKCBEG51vvwvx55x2v.t532z4BE-331Crt1.58x@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3TAVoYgcKCBEG51vvwvx55x2v.t532z4BE-331Crt1.58x@flex--zokeefe.bounces.google.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: B1A1A20051 X-Rspam-User: X-Stat-Signature: koficem6ammbic5ycn8hz8611hw5yxwc X-HE-Tag: 1650984263-234565 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This idea was introduced by David Rientjes[1]. Introduce a new madvise mode, MADV_COLLAPSE, that allows users to request a synchronous collapse of memory at their own expense. The benefits of this approach are: * CPU is charged to the process that wants to spend the cycles for the THP * Avoid unpredictable timing of khugepaged collapse An immediate user of this new functionality are malloc() implementations that manage memory in hugepage-sized chunks, but sometimes subrelease memory back to the system in native-sized chunks via MADV_DONTNEED; zapping the pmd. Later, when the memory is hot, the implementation could madvise(MADV_COLLAPSE) to re-back the memory by THPs to regain hugepage coverage and dTLB performance. TCMalloc is such an implementation that could benefit from this[2]. Only privately-mapped anon memory is supported for now, but it is expected that file and shmem support will be added later to support the use-case of backing executable text by THPs. Current support provided by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which might impair services from serving at their full rated load after (re)starting. Tricks like mremap(2)'ing text onto anonymous memory to immediately realize iTLB performance prevents page sharing and demand paging, both of which increase steady state memory footprint. With MADV_COLLAPSE, we get the best of both worlds: Peak upfront performance and lower RAM footprints. This call respects THP eligibility as determined by the system-wide sysfs settings and the VMA flags for the memory range being collapsed. [1] https://lore.kernel.org/linux-mm/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/ [2] https://github.com/google/tcmalloc/tree/master/tcmalloc Suggested-by: David Rientjes Signed-off-by: Zach O'Keefe Reported-by: kernel test robot --- arch/alpha/include/uapi/asm/mman.h | 2 + arch/mips/include/uapi/asm/mman.h | 2 + arch/parisc/include/uapi/asm/mman.h | 2 + arch/xtensa/include/uapi/asm/mman.h | 2 + include/linux/huge_mm.h | 12 ++ include/uapi/asm-generic/mman-common.h | 2 + mm/khugepaged.c | 158 +++++++++++++++++++++++-- mm/madvise.c | 5 + 8 files changed, 173 insertions(+), 12 deletions(-) diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index 4aa996423b0d..763929e814e9 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -76,6 +76,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index 1be428663c10..c6e1fc77c996 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -103,6 +103,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index a7ea3204a5fa..22133a6a506e 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -70,6 +70,8 @@ #define MADV_WIPEONFORK 71 /* Zero memory on fork, child only */ #define MADV_KEEPONFORK 72 /* Undo MADV_WIPEONFORK */ +#define MADV_COLLAPSE 73 /* Synchronous hugepage collapse */ + #define MADV_HWPOISON 100 /* poison a page for testing */ #define MADV_SOFT_OFFLINE 101 /* soft offline page for testing */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index 7966a58af472..1ff0c858544f 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -111,6 +111,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 816a9937f30e..ddad7c7af44e 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -236,6 +236,9 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); +int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end); void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, long adjust_next); spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); @@ -392,6 +395,15 @@ static inline int hugepage_madvise(struct vm_area_struct *vma, BUG(); return 0; } + +static inline int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + BUG(); + return 0; +} + static inline void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6c1aa92a92e4..6ce1f1ceb432 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -77,6 +77,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 0e4f5fbe00d2..098919d0324b 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -847,6 +847,23 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void) return khugepaged_defrag() ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT; } +static struct page *alloc_hpage(struct collapse_control *cc, gfp_t gfp, + int node) +{ + VM_BUG_ON_PAGE(cc->hpage, cc->hpage); + + cc->hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); + if (unlikely(!cc->hpage)) { + count_vm_event(THP_COLLAPSE_ALLOC_FAILED); + cc->hpage = ERR_PTR(-ENOMEM); + return NULL; + } + + prep_transhuge_page(cc->hpage); + count_vm_event(THP_COLLAPSE_ALLOC); + return cc->hpage; +} + #ifdef CONFIG_NUMA static int khugepaged_find_target_node(struct collapse_control *cc) { @@ -893,18 +910,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) static struct page *khugepaged_alloc_page(struct collapse_control *cc, gfp_t gfp, int node) { - VM_BUG_ON_PAGE(cc->hpage, cc->hpage); - - cc->hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); - if (unlikely(!cc->hpage)) { - count_vm_event(THP_COLLAPSE_ALLOC_FAILED); - cc->hpage = ERR_PTR(-ENOMEM); - return NULL; - } - - prep_transhuge_page(cc->hpage); - count_vm_event(THP_COLLAPSE_ALLOC); - return cc->hpage; + return alloc_hpage(cc, gfp, node); } #else static int khugepaged_find_target_node(struct collapse_control *cc) @@ -2471,3 +2477,131 @@ void khugepaged_min_free_kbytes_update(void) set_recommended_min_free_kbytes(); mutex_unlock(&khugepaged_mutex); } + +static inline gfp_t alloc_hugepage_madvise_gfpmask(void) +{ + return GFP_TRANSHUGE; +} + +static void madvise_collapse_cleanup_page(struct page **hpage) +{ + if (!IS_ERR(*hpage) && *hpage) + put_page(*hpage); + *hpage = NULL; +} + +static int madvise_collapse_errno(enum scan_result r) +{ + switch (r) { + case SCAN_PMD_NULL: + case SCAN_ADDRESS_RANGE: + case SCAN_VMA_NULL: + case SCAN_PTE_NON_PRESENT: + case SCAN_PAGE_NULL: + /* + * Addresses in the specified range are not currently mapped, + * or are outside the AS of the process. + */ + return -ENOMEM; + case SCAN_ALLOC_HUGE_PAGE_FAIL: + case SCAN_CGROUP_CHARGE_FAIL: + /* A kernel resource was temporarily unavailable. */ + return -EAGAIN; + default: + return -EINVAL; + } +} + +int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + struct collapse_control cc = { + .last_target_node = NUMA_NO_NODE, + .hpage = NULL, + .gfp = &alloc_hugepage_madvise_gfpmask, + .alloc_hpage = &alloc_hpage, + }; + struct mm_struct *mm = vma->vm_mm; + struct collapse_result cr; + unsigned long hstart, hend, addr; + int thps = 0, nr_hpages = 0; + + BUG_ON(vma->vm_start > start); + BUG_ON(vma->vm_end < end); + + *prev = vma; + + if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) + return -EINVAL; + + hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; + hend = end & HPAGE_PMD_MASK; + nr_hpages = (hend - hstart) >> HPAGE_PMD_SHIFT; + + if (hstart >= hend || !transparent_hugepage_active(vma)) + return -EINVAL; + + mmgrab(mm); + lru_add_drain(); + + for (addr = hstart; ; ) { + mmap_assert_locked(mm); + cond_resched(); + memset(&cr, 0, sizeof(cr)); + + if (unlikely(khugepaged_test_exit(mm))) { + cr.result = SCAN_ANY_PROCESS; + break; + } + + memset(cc.node_load, 0, sizeof(cc.node_load)); + khugepaged_scan_pmd(mm, vma, addr, &cc, &cr); + if (cr.dropped_mmap_lock) + *prev = NULL; /* tell madvise we dropped mmap_lock */ + + switch (cr.result) { + /* Whitelisted set of results where continuing OK */ + case SCAN_SUCCEED: + case SCAN_PMD_MAPPED: + ++thps; + case SCAN_PMD_NULL: + case SCAN_PTE_NON_PRESENT: + case SCAN_PTE_UFFD_WP: + case SCAN_PAGE_RO: + case SCAN_LACK_REFERENCED_PAGE: + case SCAN_PAGE_NULL: + case SCAN_PAGE_COUNT: + case SCAN_PAGE_LOCK: + case SCAN_PAGE_COMPOUND: + break; + case SCAN_PAGE_LRU: + lru_add_drain_all(); + goto retry; + default: + /* Other error, exit */ + goto break_loop; + } + addr += HPAGE_PMD_SIZE; + if (addr >= hend) + break; +retry: + if (cr.dropped_mmap_lock) { + mmap_read_lock(mm); + cr.result = hugepage_vma_revalidate(mm, addr, &vma); + if (cr.result) + goto out; + } + madvise_collapse_cleanup_page(&cc.hpage); + } + +break_loop: + /* madvise_walk_vmas() expects us to hold mmap_lock on return */ + if (cr.dropped_mmap_lock) + mmap_read_lock(mm); +out: + mmap_assert_locked(mm); + madvise_collapse_cleanup_page(&cc.hpage); + mmdrop(mm); + + return thps == nr_hpages ? 0 : madvise_collapse_errno(cr.result); +} diff --git a/mm/madvise.c b/mm/madvise.c index 5f4537511532..638517952bd2 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -59,6 +59,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_FREE: case MADV_POPULATE_READ: case MADV_POPULATE_WRITE: + case MADV_COLLAPSE: return 0; default: /* be safe, default to 1. list exceptions explicitly */ @@ -1054,6 +1055,8 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, if (error) goto out; break; + case MADV_COLLAPSE: + return madvise_collapse(vma, prev, start, end); } anon_name = anon_vma_name(vma); @@ -1147,6 +1150,7 @@ madvise_behavior_valid(int behavior) #ifdef CONFIG_TRANSPARENT_HUGEPAGE case MADV_HUGEPAGE: case MADV_NOHUGEPAGE: + case MADV_COLLAPSE: #endif case MADV_DONTDUMP: case MADV_DODUMP: @@ -1336,6 +1340,7 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * MADV_NOHUGEPAGE - mark the given range as not worth being backed by * transparent huge pages so the existing pages will not be * coalesced into THP and new pages will not be allocated as THP. + * MADV_COLLAPSE - synchronously coalesce pages into new THP. * MADV_DONTDUMP - the application wants to prevent pages in the given range * from being included in its core dump. * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump. From patchwork Tue Apr 26 14:44:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12827326 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD347C433EF for ; Tue, 26 Apr 2022 14:44:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4D5646B0082; Tue, 26 Apr 2022 10:44:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B7286B0081; Tue, 26 Apr 2022 10:44:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 375716B0082; Tue, 26 Apr 2022 10:44:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 21ADD6B0080 for ; Tue, 26 Apr 2022 10:44:33 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id C888F1205DE for ; Tue, 26 Apr 2022 14:44:32 +0000 (UTC) X-FDA: 79399301184.15.52D99E6 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf18.hostedemail.com (Postfix) with ESMTP id 462831C0046 for ; Tue, 26 Apr 2022 14:44:28 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id o7-20020a17090a0a0700b001d93c491131so1655737pjo.6 for ; Tue, 26 Apr 2022 07:44:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=afDV4Kg2p84nRBBr1cyNFcsXugsFr+Oyds547m4ROFw=; b=R1sm8eHRVYIH2PYnzJRNXQIcW8nxLKf2uO69EhMHDcTsV8Z5V0Xid70AJ4Jxaj4LJW FAm4MKdHVpWWJUJRNrYoVubjqiFocwGehIncrxV4Ti1VHyMcuYgNaqbLycqg1g2Sn6y3 3S5qBEzylYMC1t0Hg4XnD8oXRqV7ytb62aIzKYx6Yg3/rcleWWawT2dJhOyYo3t9Tn+o 30WtEHOCRUHiN4KLpYatb8W6/JnNx+QzygiggYS+I3K6VrmlGF62SCDYQo45aZaYCL+w EIWdfZBU8yn1SwjSDxDbusrcjX1FyQG40UujLJcIiab9J4J8xHMiDFFOkbkfG4M39B25 2tsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=afDV4Kg2p84nRBBr1cyNFcsXugsFr+Oyds547m4ROFw=; b=qZ3XzX9WcKNt2XsQ7Liwh2MlLZcYE9PE6GynqlVKcPRIjNec7r9bvbfI/r65FFYufe UgN88qwLEQ2pExK7eRCCCqGiAY8QBp306y3VhxvfVlKkEHnrwbMPxgtEXF/tzna0qIrO SZQi5UuZdyYBGKMmHkUd45foEJIq861HicxLV/OSpC92sagoWja0u1i0VFAE7luptMbx MXYisErk2MODM35Bw6nP/jjU1aWmYxNt0OtPK9XEWaTkAkM1ltChza0UNplnU1sUBb2S ZeXufMZx0bjNNM+gMUXnFW4cotDwL3OfQs9fGjqoMsyFkev1dMAlcr9CTdKVhTXdsWn2 FjFw== X-Gm-Message-State: AOAM5332MgASuYalOBd3vLXTjTZt2PugiZy3xq/svTNzCmbOkG64dd8V hhB4VfJXazxjD0hQdWKu0SZm0GIgkVPr X-Google-Smtp-Source: ABdhPJxd1iRPxUyj9q0sPOoc4AYu2SecJznCurX18huAeuHfYCo/av64cO0A9eOynUkIl0r4XfyB1wU4bmIl X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:b906:b0:158:3120:3b69 with SMTP id bf6-20020a170902b90600b0015831203b69mr23321179plb.33.1650984271160; Tue, 26 Apr 2022 07:44:31 -0700 (PDT) Date: Tue, 26 Apr 2022 07:44:06 -0700 In-Reply-To: <20220426144412.742113-1-zokeefe@google.com> Message-Id: <20220426144412.742113-7-zokeefe@google.com> Mime-Version: 1.0 References: <20220426144412.742113-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v3 06/12] mm/khugepaged: remove khugepaged prefix from shared collapse functions From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" , kernel test robot X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 462831C0046 X-Stat-Signature: 1itkjp81oxtbdghmy41auw65j9dd55t6 X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=R1sm8eHR; spf=pass (imf18.hostedemail.com: domain of 3TwVoYgcKCBQJ84yyzy08805y.w86527EH-664Fuw4.8B0@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3TwVoYgcKCBQJ84yyzy08805y.w86527EH-664Fuw4.8B0@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1650984268-613946 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The following functions/tracepoints are shared between khugepaged and madvise collapse contexts. Remove the khugepaged prefixes. tracepoint:mm_khugepaged_scan_pmd -> tracepoint:mm_scan_pmd khugepaged_test_exit() -> test_exit() khugepaged_scan_abort() -> scan_abort() khugepaged_scan_pmd() -> scan_pmd() khugepaged_find_target_node() -> find_target_node() Signed-off-by: Zach O'Keefe Reported-by: kernel test robot --- include/trace/events/huge_memory.h | 2 +- mm/khugepaged.c | 70 ++++++++++++++---------------- 2 files changed, 34 insertions(+), 38 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 9faa678e0a5b..09be0e2f76b1 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -48,7 +48,7 @@ SCAN_STATUS #define EM(a, b) {a, b}, #define EMe(a, b) {a, b} -TRACE_EVENT(mm_khugepaged_scan_pmd, +TRACE_EVENT(mm_scan_pmd, TP_PROTO(struct mm_struct *mm, struct page *page, bool writable, int referenced, int none_or_zero, int status, int unmapped), diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 098919d0324b..a6881f5b3c67 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -90,7 +90,7 @@ struct collapse_control { /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; - /* Last target selected in khugepaged_find_target_node() for this scan */ + /* Last target selected in find_target_node() for this scan */ int last_target_node; struct page *hpage; @@ -454,7 +454,7 @@ static void insert_to_mm_slots_hash(struct mm_struct *mm, hash_add(mm_slots_hash, &mm_slot->hash, (long)mm); } -static inline int khugepaged_test_exit(struct mm_struct *mm) +static inline int test_exit(struct mm_struct *mm) { return atomic_read(&mm->mm_users) == 0; } @@ -506,7 +506,7 @@ void __khugepaged_enter(struct mm_struct *mm) return; /* __khugepaged_exit() must not run from under us */ - VM_BUG_ON_MM(khugepaged_test_exit(mm), mm); + VM_BUG_ON_MM(test_exit(mm), mm); if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) { free_mm_slot(mm_slot); return; @@ -558,12 +558,11 @@ void __khugepaged_exit(struct mm_struct *mm) mmdrop(mm); } else if (mm_slot) { /* - * This is required to serialize against - * khugepaged_test_exit() (which is guaranteed to run - * under mmap sem read mode). Stop here (after we - * return all pagetables will be destroyed) until - * khugepaged has finished working on the pagetables - * under the mmap_lock. + * This is required to serialize against test_exit() (which is + * guaranteed to run under mmap sem read mode). Stop here + * (after we return all pagetables will be destroyed) until + * khugepaged has finished working on the pagetables under + * the mmap_lock. */ mmap_write_lock(mm); mmap_write_unlock(mm); @@ -817,7 +816,7 @@ static void khugepaged_alloc_sleep(void) remove_wait_queue(&khugepaged_wait, &wait); } -static bool khugepaged_scan_abort(int nid, struct collapse_control *cc) +static bool scan_abort(int nid, struct collapse_control *cc) { int i; @@ -865,7 +864,7 @@ static struct page *alloc_hpage(struct collapse_control *cc, gfp_t gfp, } #ifdef CONFIG_NUMA -static int khugepaged_find_target_node(struct collapse_control *cc) +static int find_target_node(struct collapse_control *cc) { int nid, target_node = 0, max_value = 0; @@ -913,7 +912,7 @@ static struct page *khugepaged_alloc_page(struct collapse_control *cc, return alloc_hpage(cc, gfp, node); } #else -static int khugepaged_find_target_node(struct collapse_control *cc) +static int find_target_node(struct collapse_control *cc) { return 0; } @@ -994,7 +993,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, struct vm_area_struct *vma; unsigned long hstart, hend; - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(test_exit(mm))) return SCAN_ANY_PROCESS; *vmap = vma = find_vma(mm, address); @@ -1038,7 +1037,7 @@ static int find_pmd_or_thp_or_none(struct mm_struct *mm, /* * Bring missing pages in from swap, to complete THP collapse. - * Only done if khugepaged_scan_pmd believes it is worthwhile. + * Only done if scan_pmd believes it is worthwhile. * * Called and returns without pte mapped or spinlocks held, * but with mmap_lock held to protect against vma changes. @@ -1130,7 +1129,7 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, mmap_read_unlock(mm); cr->dropped_mmap_lock = true; - node = khugepaged_find_target_node(cc); + node = find_target_node(cc); /* sched to specified node before huage page memory copy */ if (task_node(current) != node) { cpumask = cpumask_of_node(node); @@ -1271,11 +1270,9 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, return; } -static void khugepaged_scan_pmd(struct mm_struct *mm, - struct vm_area_struct *vma, - unsigned long address, - struct collapse_control *cc, - struct collapse_result *cr) +static void scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, struct collapse_control *cc, + struct collapse_result *cr) { pmd_t *pmd; pte_t *pte, *_pte; @@ -1365,7 +1362,7 @@ static void khugepaged_scan_pmd(struct mm_struct *mm, * hit record. */ node = page_to_nid(page); - if (khugepaged_scan_abort(node, cc)) { + if (scan_abort(node, cc)) { cr->result = SCAN_SCAN_ABORT; goto out_unmap; } @@ -1422,8 +1419,8 @@ static void khugepaged_scan_pmd(struct mm_struct *mm, /* collapse_huge_page will return with the mmap_lock released */ collapse_huge_page(mm, address, cc, referenced, unmapped, cr); out: - trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, - none_or_zero, cr->result, unmapped); + trace_mm_scan_pmd(mm, page, writable, referenced, none_or_zero, + cr->result, unmapped); } static void collect_mm_slot(struct mm_slot *mm_slot) @@ -1432,7 +1429,7 @@ static void collect_mm_slot(struct mm_slot *mm_slot) lockdep_assert_held(&khugepaged_mm_lock); - if (khugepaged_test_exit(mm)) { + if (test_exit(mm)) { /* free mm_slot */ hash_del(&mm_slot->hash); list_del(&mm_slot->mm_node); @@ -1603,7 +1600,7 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) if (!mmap_write_trylock(mm)) return; - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(test_exit(mm))) goto out; for (i = 0; i < mm_slot->nr_pte_mapped_thp; i++) @@ -1666,7 +1663,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * it'll always mapped in small page size for uffd-wp * registered ranges. */ - if (!khugepaged_test_exit(mm) && !userfaultfd_wp(vma)) + if (!test_exit(mm) && !userfaultfd_wp(vma)) collapse_and_free_pmd(mm, vma, addr, pmd); mmap_write_unlock(mm); } else { @@ -1724,7 +1721,7 @@ static void collapse_file(struct mm_struct *mm, /* Only allocate from the target node */ gfp = cc->gfp() | __GFP_THISNODE; - node = khugepaged_find_target_node(cc); + node = find_target_node(cc); new_page = cc->alloc_hpage(cc, gfp, node); if (!new_page) { @@ -2108,7 +2105,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, } node = page_to_nid(page); - if (khugepaged_scan_abort(node, cc)) { + if (scan_abort(node, cc)) { cr->result = SCAN_SCAN_ABORT; break; } @@ -2197,7 +2194,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, vma = NULL; if (unlikely(!mmap_read_trylock(mm))) goto breakouterloop_mmap_lock; - if (likely(!khugepaged_test_exit(mm))) + if (likely(!test_exit(mm))) vma = find_vma(mm, khugepaged_scan.address); progress++; @@ -2205,7 +2202,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, unsigned long hstart, hend; cond_resched(); - if (unlikely(khugepaged_test_exit(mm))) { + if (unlikely(test_exit(mm))) { progress++; break; } @@ -2229,7 +2226,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, while (khugepaged_scan.address < hend) { struct collapse_result cr = {0}; cond_resched(); - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(test_exit(mm))) goto breakouterloop; VM_BUG_ON(khugepaged_scan.address < hstart || @@ -2245,9 +2242,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, khugepaged_scan_file(mm, file, pgoff, cc, &cr); fput(file); } else { - khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, - cc, &cr); + scan_pmd(mm, vma, khugepaged_scan.address, cc, + &cr); } if (cr.result == SCAN_SUCCEED) ++khugepaged_pages_collapsed; @@ -2271,7 +2267,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, * Release the current mm_slot if this mm is about to die, or * if we scanned all vmas of this mm. */ - if (khugepaged_test_exit(mm) || !vma) { + if (test_exit(mm) || !vma) { /* * Make sure that if mm_users is reaching zero while * khugepaged runs here, khugepaged_exit will find @@ -2549,13 +2545,13 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, cond_resched(); memset(&cr, 0, sizeof(cr)); - if (unlikely(khugepaged_test_exit(mm))) { + if (unlikely(test_exit(mm))) { cr.result = SCAN_ANY_PROCESS; break; } memset(cc.node_load, 0, sizeof(cc.node_load)); - khugepaged_scan_pmd(mm, vma, addr, &cc, &cr); + scan_pmd(mm, vma, addr, &cc, &cr); if (cr.dropped_mmap_lock) *prev = NULL; /* tell madvise we dropped mmap_lock */ From patchwork Tue Apr 26 14:44:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12827327 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A48A2C433F5 for ; Tue, 26 Apr 2022 14:44:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3EC2A6B0080; Tue, 26 Apr 2022 10:44:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 39BB36B0081; Tue, 26 Apr 2022 10:44:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 214E16B0083; Tue, 26 Apr 2022 10:44:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 140226B0080 for ; Tue, 26 Apr 2022 10:44:35 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id E4C61807DA for ; Tue, 26 Apr 2022 14:44:34 +0000 (UTC) X-FDA: 79399301268.19.9788329 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf13.hostedemail.com (Postfix) with ESMTP id 48FB920051 for ; Tue, 26 Apr 2022 14:44:28 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id d188-20020a25cdc5000000b00648429e5ab9so6930715ybf.13 for ; Tue, 26 Apr 2022 07:44:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=RnGp+l6jR1hMyWYazyCiftnVnZIhQDELBFPeJ+ULhiI=; b=WRCkrA+29V82sY+Ufhowucw63C1PzzpG3+jwsXPHV7YQN7t8BOdyjL2ITzdvpI277j d1XgxTRcdDSawMDuDaP3NTU4M2942/X4CwGSO4+J1gEBwOoi53f71Da/uxx1GekoOVKh ZBEEyftAELVNmHdeuMHHDCivXh8rTAsvFAqnmAO0XaVLf4PwL5yDkhuhqvRPi3IwA2l1 Otas2sE6zv+SnMPpOcczwcZILCkKGyTWV9aFqIZ2bgCYQfOWFx0IP6+FzCT7a+b8D2cD 2ejVMQXWlr/NPVa3tOglvLjc+RFqCQBN9d7u8z8ImW4N5h5226pVQt5vdeIQch0pvssr vUZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=RnGp+l6jR1hMyWYazyCiftnVnZIhQDELBFPeJ+ULhiI=; b=Qy86KA1UDE7eUyWFqfutm8Ir4q/sZs2svJY7EwpD3CaHFGHLYj+0pw+9NS+amDoVjC BDPxg+7xU+SGf75xXLfzo70pho+SuEXNAym/qfKAyjOFt6xfpKVkx+3qVwVolfHMr05q WLmdfg4CCoQ2LoJgL3yqwZiPh4q459PoTcn3a4fwhQRBCcwc+vaqw79lmewUYxbuj9FY YoX4aXjvx1nxQnG5KMNAqizYWYRKZKlQzYVfZFghFVzCcD+0HaFkRpxs/arRkfx6HWjO PIjAO6rtE90rg8+4sMMqhYW1zGV5u4a9XKzNKdK0GYxKG6Qb1RD3bE/ZZk++jwZj+WN2 YXfg== X-Gm-Message-State: AOAM531/gqVgI6n8gqWu9qpV6lTVrjDuetPd0iP4ZLb4T7QyPHg3flks pMo/VKVqJvoUSrRjHJTpB91btTThvdZc X-Google-Smtp-Source: ABdhPJxnE2fYL4FUsr/g1hHlij2glNRMQoTMPV3pKlIKyvSbzPjB4eveaF0+X5/fsy3u0VQnJGC7EG51LGen X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6902:561:b0:648:63ff:2b61 with SMTP id a1-20020a056902056100b0064863ff2b61mr9719526ybt.30.1650984273652; Tue, 26 Apr 2022 07:44:33 -0700 (PDT) Date: Tue, 26 Apr 2022 07:44:07 -0700 In-Reply-To: <20220426144412.742113-1-zokeefe@google.com> Message-Id: <20220426144412.742113-8-zokeefe@google.com> Mime-Version: 1.0 References: <20220426144412.742113-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v3 07/12] mm/khugepaged: add flag to ignore khugepaged_max_ptes_* From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=WRCkrA+2; spf=pass (imf13.hostedemail.com: domain of 3UQVoYgcKCBYLA600102AA270.yA8749GJ-886Hwy6.AD2@flex--zokeefe.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3UQVoYgcKCBYLA600102AA270.yA8749GJ-886Hwy6.AD2@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 48FB920051 X-Stat-Signature: iunsmma81udmqh4zrsra6zbbf9t696z1 X-HE-Tag: 1650984268-924277 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add enforce_pte_scan_limits flag to struct collapse_control that allows context to ignore sysfs-controlled knobs: khugepaged_max_ptes_[none|swap|shared]. Set this flag in khugepaged collapse context to preserve existing khugepaged behavior and unset the flag in madvise collapse context since the user presumably has reason to believe the collapse will be beneficial. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a6881f5b3c67..57725482290d 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -87,6 +87,9 @@ static struct kmem_cache *mm_slot_cache __read_mostly; #define MAX_PTE_MAPPED_THP 8 struct collapse_control { + /* Respect khugepaged_max_ptes_[none|swap|shared] */ + bool enforce_pte_scan_limits; + /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; @@ -632,6 +635,7 @@ static bool is_refcount_suitable(struct page *page) static int __collapse_huge_page_isolate(struct vm_area_struct *vma, unsigned long address, pte_t *pte, + struct collapse_control *cc, struct list_head *compound_pagelist) { struct page *page = NULL; @@ -645,7 +649,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (pte_none(pteval) || (pte_present(pteval) && is_zero_pfn(pte_pfn(pteval)))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !cc->enforce_pte_scan_limits)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -665,8 +670,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, VM_BUG_ON_PAGE(!PageAnon(page), page); - if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + if (cc->enforce_pte_scan_limits && page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -1208,7 +1213,7 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, mmu_notifier_invalidate_range_end(&range); spin_lock(pte_ptl); - cr->result = __collapse_huge_page_isolate(vma, address, pte, + cr->result = __collapse_huge_page_isolate(vma, address, pte, cc, &compound_pagelist); spin_unlock(pte_ptl); @@ -1297,7 +1302,8 @@ static void scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, _pte++, _address += PAGE_SIZE) { pte_t pteval = *_pte; if (is_swap_pte(pteval)) { - if (++unmapped <= khugepaged_max_ptes_swap) { + if (++unmapped <= khugepaged_max_ptes_swap || + !cc->enforce_pte_scan_limits) { /* * Always be strict with uffd-wp * enabled swap entries. Please see @@ -1316,7 +1322,8 @@ static void scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, } if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !cc->enforce_pte_scan_limits)) { continue; } else { cr->result = SCAN_EXCEED_NONE_PTE; @@ -1346,8 +1353,9 @@ static void scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, goto out_unmap; } - if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + if (cc->enforce_pte_scan_limits && + page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { cr->result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out_unmap; @@ -2087,7 +2095,8 @@ static void khugepaged_scan_file(struct mm_struct *mm, continue; if (xa_is_value(page)) { - if (++swap > khugepaged_max_ptes_swap) { + if (cc->enforce_pte_scan_limits && + ++swap > khugepaged_max_ptes_swap) { cr->result = SCAN_EXCEED_SWAP_PTE; count_vm_event(THP_SCAN_EXCEED_SWAP_PTE); break; @@ -2138,7 +2147,8 @@ static void khugepaged_scan_file(struct mm_struct *mm, rcu_read_unlock(); if (cr->result == SCAN_SUCCEED) { - if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { + if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none && + cc->enforce_pte_scan_limits) { cr->result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { @@ -2365,6 +2375,7 @@ static int khugepaged(void *none) { struct mm_slot *mm_slot; struct collapse_control cc = { + .enforce_pte_scan_limits = true, .last_target_node = NUMA_NO_NODE, .gfp = &alloc_hugepage_khugepaged_gfpmask, .alloc_hpage = &khugepaged_alloc_page, @@ -2512,6 +2523,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, unsigned long end) { struct collapse_control cc = { + .enforce_pte_scan_limits = false, .last_target_node = NUMA_NO_NODE, .hpage = NULL, .gfp = &alloc_hugepage_madvise_gfpmask, From patchwork Tue Apr 26 14:44:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12827328 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF142C433F5 for ; Tue, 26 Apr 2022 14:44:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FB3E6B0081; Tue, 26 Apr 2022 10:44:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 682276B0083; Tue, 26 Apr 2022 10:44:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FDDB6B0085; Tue, 26 Apr 2022 10:44:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 430A46B0081 for ; Tue, 26 Apr 2022 10:44:37 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 158ED12073B for ; Tue, 26 Apr 2022 14:44:37 +0000 (UTC) X-FDA: 79399301394.20.2AC1E4F Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf28.hostedemail.com (Postfix) with ESMTP id 40F22C0050 for ; Tue, 26 Apr 2022 14:44:29 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id w24-20020a170902a71800b0015d00267d74so4746186plq.6 for ; Tue, 26 Apr 2022 07:44:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=xkNoRIjDVkHtQzAd/hGEjCQdf3OP2abrGhCTRsST8ak=; b=JM7KDsYLHyZ0+MchKRZmiMfv83rqqWv+5BHvEMU7t1ukhl1BU2nym/cFUEzDq4g4fN CF9VwbX5CNp2m4s7wKdIMzcfp3b0lX5Ir1/cP3Q1g/NRc+7k/ut8qLY6KZvagsJzn8Ow 9AR5hh0RZ60AEfzqcD/vjYv3xpk6sRz6rQ+BzI9i7cGcvBaXpEQR1vP2rGywHsmJiXaW KlZzfkDg0+7uVaxx/KTrYX/OaWqZxAkkfz/knylFOtjZ7QV9wLIeoFBi78wOKroUE+nR ntxNhrOgCvhmPBrQSG4Sr7w9UfSj8KgDfRVQDWJJjZNE5fkok/tH+Z8oRQ09vByzHH58 r16w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=xkNoRIjDVkHtQzAd/hGEjCQdf3OP2abrGhCTRsST8ak=; b=ByxK2gCtdlPlNxQG1yGLRl9ZJVRqie9c+ekgMVknCLSwLJh2UL8X/mEBumACL7iTXT VRaBDUB/17cvE4CwKFU8FCiuMfrZXKLD5JMAB+mHnhqkOTjdL3Gzb5qA+XN5F6qAz+8B ALrUOL6Z3METz/Res16VyNsrybOgzkmbMsNrAmmciDwTDiR+8vLiqzwUCrKjBnahxLKj naENcbCnenWNhU7IVUIKeQOisgTj5eB49s7GvMS+yr1vedEqOkBnl50a43ucTqdAfCyS NpC6pZpMfrq3K1AFxtea5GBfO+G7nlSYq/M6Zoi0CUT6BUrbtXKVin/s5fYBkmdYhdmV HmwA== X-Gm-Message-State: AOAM533GRXeRMVTdvXZ4+V/007Nu4kv4UQsgp7CIZnlaO8iCGS4pmgPI N71d3ClqDHBMAtE+d1ZlX2dkDWRF92BB X-Google-Smtp-Source: ABdhPJxEGwL6WocYdNy7DdgsNzPCUweH8djyXvLrDbpuitZbRvma1Rljmv22JyepXsTf9Ei2io5h3kmRkrGj X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:e550:b0:15c:f4f2:814f with SMTP id n16-20020a170902e55000b0015cf4f2814fmr15334178plf.123.1650984275223; Tue, 26 Apr 2022 07:44:35 -0700 (PDT) Date: Tue, 26 Apr 2022 07:44:08 -0700 In-Reply-To: <20220426144412.742113-1-zokeefe@google.com> Message-Id: <20220426144412.742113-9-zokeefe@google.com> Mime-Version: 1.0 References: <20220426144412.742113-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v3 08/12] mm/khugepaged: add flag to ignore page young/referenced requirement From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 40F22C0050 X-Stat-Signature: 3n7uyak1f7uye9yxrdyst6wdoykq8d3g X-Rspam-User: Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=JM7KDsYL; spf=pass (imf28.hostedemail.com: domain of 3UwVoYgcKCBgNC822324CC492.0CA96BIL-AA8Jy08.CF4@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3UwVoYgcKCBgNC822324CC492.0CA96BIL-AA8Jy08.CF4@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1650984269-814076 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add enforce_young flag to struct collapse_control that allows context to ignore requirement that some pages in region being collapsed be young or referenced. Set this flag in khugepaged collapse context to preserve existing khugepaged behavior and unset the flag in madvise collapse context since the user presumably has reason to believe the collapse will be beneficial. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 57725482290d..fe6810825259 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -90,6 +90,9 @@ struct collapse_control { /* Respect khugepaged_max_ptes_[none|swap|shared] */ bool enforce_pte_scan_limits; + /* Require memory to be young */ + bool enforce_young; + /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; @@ -738,9 +741,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, list_add_tail(&page->lru, compound_pagelist); next: /* There should be enough young pte to collapse the page */ - if (pte_young(pteval) || - page_is_young(page) || PageReferenced(page) || - mmu_notifier_test_young(vma->vm_mm, address)) + if (cc->enforce_young && + (pte_young(pteval) || page_is_young(page) || + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + address))) referenced++; if (pte_write(pteval)) @@ -749,7 +753,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (unlikely(!writable)) { result = SCAN_PAGE_RO; - } else if (unlikely(!referenced)) { + } else if (unlikely(cc->enforce_young && !referenced)) { result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; @@ -1409,14 +1413,16 @@ static void scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, cr->result = SCAN_PAGE_COUNT; goto out_unmap; } - if (pte_young(pteval) || - page_is_young(page) || PageReferenced(page) || - mmu_notifier_test_young(vma->vm_mm, address)) + if (cc->enforce_young && + (pte_young(pteval) || page_is_young(page) || + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + address))) referenced++; } if (!writable) { cr->result = SCAN_PAGE_RO; - } else if (!referenced || (unmapped && referenced < HPAGE_PMD_NR/2)) { + } else if (cc->enforce_young && (!referenced || (unmapped && referenced + < HPAGE_PMD_NR / 2))) { cr->result = SCAN_LACK_REFERENCED_PAGE; } else { cr->result = SCAN_SUCCEED; @@ -2376,6 +2382,7 @@ static int khugepaged(void *none) struct mm_slot *mm_slot; struct collapse_control cc = { .enforce_pte_scan_limits = true, + .enforce_young = true, .last_target_node = NUMA_NO_NODE, .gfp = &alloc_hugepage_khugepaged_gfpmask, .alloc_hpage = &khugepaged_alloc_page, @@ -2524,6 +2531,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, { struct collapse_control cc = { .enforce_pte_scan_limits = false, + .enforce_young = false, .last_target_node = NUMA_NO_NODE, .hpage = NULL, .gfp = &alloc_hugepage_madvise_gfpmask, From patchwork Tue Apr 26 14:44:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12827329 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A29E7C433EF for ; Tue, 26 Apr 2022 14:44:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A7EF06B0083; Tue, 26 Apr 2022 10:44:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A2AB76B0085; Tue, 26 Apr 2022 10:44:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F5506B0087; Tue, 26 Apr 2022 10:44:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 81B5B6B0083 for ; Tue, 26 Apr 2022 10:44:38 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 4D6B180832 for ; Tue, 26 Apr 2022 14:44:38 +0000 (UTC) X-FDA: 79399301436.14.2D97308 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf07.hostedemail.com (Postfix) with ESMTP id EA0094004B for ; Tue, 26 Apr 2022 14:44:35 +0000 (UTC) Received: by mail-pl1-f202.google.com with SMTP id a3-20020a1709027d8300b0015a4eccd6e4so10095693plm.13 for ; Tue, 26 Apr 2022 07:44:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=lKv9V4yTi4bWvH77oSgKL9Tk3beoanopD81y4dLGnnY=; b=N4HiqvoPvC4RTwf54SlGk0AKRUhItRZA2B7DeirAY3qb3B9JM2SHffQ8xTA1EcEFXr tgHBQYEr2OwL+v9Ht25PWM06ykaxwkofy2/ywZwpgUq2c390bdTKjYgxoTFz3dObjqkO r8UHKchyF5XolC33b7FxrybnAp4D33dkqnUyAJVaa8Mqy0QO0nl/B36Rrli/+MIkhhvM FDYnRGyo2a4HMUjtp8ytg5OF4EA442xID7f34MtUO9DrJN/JvnBofWrxr1CysX+yTlRP po/8jnaRA05iB55+DperApvdtlcG5BV3PaNp6tU/rCg4tWAIzjFXP2ugAGZNwE8JRdoy VF7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=lKv9V4yTi4bWvH77oSgKL9Tk3beoanopD81y4dLGnnY=; b=FMjpF/ZXOdkV5Src/p4eGeP1hjySYc1p3v657J94G0Z6Pr5si4o0x5R5QEst/2Aufc C9XridZIdWNDA+W50UCxvzx7fw/lYAVron9H+gR1/oPqssHq6rOaEWQXcqs9+nv8fpUJ Lm1ZCC3uZyHLAGaBmDom6n1WpwwHl2kxJH+wNBiCyvHHJyQktvAgB2PAbyCUqYpRnYPq eboLtMnpLBFAwNDlGAPM+fLMOFuViRGwgbLAH+hVOy7GMtuMnUcihxOM0ZaLxWLyU+00 VZK2KV4JsjfA+FArX89HiOJPyjkjEuLvXoLCcN76h/CDEuuywcOeKzvZUnLUP/bNE37U mf1A== X-Gm-Message-State: AOAM530Hs7BPgXyAxyLLpcYBfgsvP1DfSSXK8sJY5wu+GIVMqtG9mADE NQ95bC1CeMdXN8jW1LPAgedTJBOrD5ZU X-Google-Smtp-Source: ABdhPJwG3C8E+2LS6ibuY6YZGTS9jP8m3v/Z1ihn168+ggqK32MQkUZF+810XaNMc8NuJvotkDZmiZEFF1jz X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:aa7:8241:0:b0:509:979d:c760 with SMTP id e1-20020aa78241000000b00509979dc760mr24815738pfn.84.1650984276772; Tue, 26 Apr 2022 07:44:36 -0700 (PDT) Date: Tue, 26 Apr 2022 07:44:09 -0700 In-Reply-To: <20220426144412.742113-1-zokeefe@google.com> Message-Id: <20220426144412.742113-10-zokeefe@google.com> Mime-Version: 1.0 References: <20220426144412.742113-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v3 09/12] mm/madvise: add MADV_COLLAPSE to process_madvise() From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: zgb8do4y5bsxb4ez7tarpbxc79s1ygsb Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=N4HiqvoP; spf=pass (imf07.hostedemail.com: domain of 3VAVoYgcKCBkOD933435DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--zokeefe.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3VAVoYgcKCBkOD933435DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: EA0094004B X-HE-Tag: 1650984275-488620 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Allow MADV_COLLAPSE behavior for process_madvise(2) if caller has CAP_SYS_ADMIN or is requesting collapse of it's own memory. Signed-off-by: Zach O'Keefe --- mm/madvise.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 638517952bd2..08c11217025a 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1168,13 +1168,15 @@ madvise_behavior_valid(int behavior) } static bool -process_madvise_behavior_valid(int behavior) +process_madvise_behavior_valid(int behavior, struct task_struct *task) { switch (behavior) { case MADV_COLD: case MADV_PAGEOUT: case MADV_WILLNEED: return true; + case MADV_COLLAPSE: + return task == current || capable(CAP_SYS_ADMIN); default: return false; } @@ -1452,7 +1454,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, goto free_iov; } - if (!process_madvise_behavior_valid(behavior)) { + if (!process_madvise_behavior_valid(behavior, task)) { ret = -EINVAL; goto release_task; } From patchwork Tue Apr 26 14:44:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12827330 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5CE5C433EF for ; Tue, 26 Apr 2022 14:44:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD6FD6B0085; Tue, 26 Apr 2022 10:44:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D3B7F6B0087; Tue, 26 Apr 2022 10:44:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB2136B0088; Tue, 26 Apr 2022 10:44:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id AD3E96B0085 for ; Tue, 26 Apr 2022 10:44:40 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 22A5D20A32 for ; Tue, 26 Apr 2022 14:44:40 +0000 (UTC) X-FDA: 79399301520.24.0B93182 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf24.hostedemail.com (Postfix) with ESMTP id 40CC8180065 for ; Tue, 26 Apr 2022 14:44:36 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id x16-20020aa793b0000000b0050d3d5c4f4eso4071856pff.6 for ; Tue, 26 Apr 2022 07:44:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=50EHamVKgl1wVQp665ExKc2s9q+OqlOGo55iup++v4A=; b=cAiDYaHgJht0eCdCVWSbgD5RA3N2OPMpa23pnXJuOKpFop7L5YrJzidv4lyTCthmoo RhvmqfJKlRdP2l95tAxl4ez2jQUasdBBH0u7ZZMyQ9mC7jChDMCvbIn6E+CdRSxW6dLi 3hr1j+9xrU3ApqhKI5OF5y2EqYmPFmvriyPQ8zrPytTrkUM+0Q0IlHAtcUGBnC2fgT68 FrKNKJRswo77EqlFWad7+GbG4ElnPQfTJEIgZnzCX5FOM7v+O3n36XTJ+SLBKmb+zyl/ 1Gs9/Cij+ybhv9NI8Cv79P/FQklVqgPsuuSxTOngQ6DQJzEOOqQwpBDGEn1R4GEb6fWh b0Xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=50EHamVKgl1wVQp665ExKc2s9q+OqlOGo55iup++v4A=; b=aAU27ugOa/9T/Z9u/0u4+Prtc6TYCl0YU6DFovP3NuPXVvO2Fh0Inyjx+bIq1gc1KW OV3fkKmdXzgVmhq+hCE+2hf/E4raFRlyCNY9RBGxQmYk+rNDhrhQOtvQq91FlZW5C2Cj lD6I7vk/DDpoG0v7t/K+dO24sbQKLxObwrqEPvDXejk8Hz8/JSU4euRzizOuFhVlpNv8 C6kNa1yX5Ic7mXTdavFTnhuQL9PnFEbC2TJ5+SZBH/3NsJGKLJnn7zmQuiOMieVBxT3t tge/bE8KY/xC4uvf+RT8sIkWZQ6V5gQyPgJLYvw1fJYS1mK1Oz2J/evfhAM404JNwwkc G/Ng== X-Gm-Message-State: AOAM532Pzcizz3fkYFsQZ7vLz3j0lt1d4D5jb/w98bzjxTnCV+kiQyvV eDEiGk95I4IydT+rQ+VaDySeM77hMTze X-Google-Smtp-Source: ABdhPJyYZuKe31VwT5bhTEalWIIVPuflMwrw19cH1zphrmt5yv0/rTh8C7TscHgA7UGYEP37WEprh/tmUr2f X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a62:4dc4:0:b0:50a:cf85:ba2 with SMTP id a187-20020a624dc4000000b0050acf850ba2mr25242121pfb.25.1650984278519; Tue, 26 Apr 2022 07:44:38 -0700 (PDT) Date: Tue, 26 Apr 2022 07:44:10 -0700 In-Reply-To: <20220426144412.742113-1-zokeefe@google.com> Message-Id: <20220426144412.742113-11-zokeefe@google.com> Mime-Version: 1.0 References: <20220426144412.742113-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v3 10/12] selftests/vm: modularize collapse selftests From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 40CC8180065 X-Stat-Signature: sfpcnmxwa54eewnussdiisdfskxs3ea8 Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=cAiDYaHg; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of 3VgVoYgcKCBsQFB55657FF7C5.3FDC9ELO-DDBM13B.FI7@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3VgVoYgcKCBsQFB55657FF7C5.3FDC9ELO-DDBM13B.FI7@flex--zokeefe.bounces.google.com X-Rspam-User: X-HE-Tag: 1650984276-156243 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Modularize the collapse action of khugepaged collapse selftests by introducing a struct collapse_context which specifies how to collapse a given memory range and the expected semantics of the collapse. This can be reused later to test other collapse contexts. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 257 +++++++++++------------- 1 file changed, 116 insertions(+), 141 deletions(-) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index 155120b67a16..c59d832fee96 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -23,6 +23,12 @@ static int hpage_pmd_nr; #define THP_SYSFS "/sys/kernel/mm/transparent_hugepage/" #define PID_SMAPS "/proc/self/smaps" +struct collapse_context { + const char *name; + void (*collapse)(const char *msg, char *p, bool expect); + bool enforce_pte_scan_limits; +}; + enum thp_enabled { THP_ALWAYS, THP_MADVISE, @@ -528,53 +534,39 @@ static void alloc_at_fault(void) munmap(p, hpage_pmd_size); } -static void collapse_full(void) +static void collapse_full(struct collapse_context *context) { void *p; p = alloc_mapping(); fill_memory(p, 0, hpage_pmd_size); - if (wait_for_scan("Collapse fully populated PTE table", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse fully populated PTE table", p, true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } -static void collapse_empty(void) +static void collapse_empty(struct collapse_context *context) { void *p; p = alloc_mapping(); - if (wait_for_scan("Do not collapse empty PTE table", p)) - fail("Timeout"); - else if (check_huge(p)) - fail("Fail"); - else - success("OK"); + context->collapse("Do not collapse empty PTE table", p, false); munmap(p, hpage_pmd_size); } -static void collapse_single_pte_entry(void) +static void collapse_single_pte_entry(struct collapse_context *context) { void *p; p = alloc_mapping(); fill_memory(p, 0, page_size); - if (wait_for_scan("Collapse PTE table with single PTE entry present", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table with single PTE entry present", p, + true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); } -static void collapse_max_ptes_none(void) +static void collapse_max_ptes_none(struct collapse_context *context) { int max_ptes_none = hpage_pmd_nr / 2; struct settings settings = default_settings; @@ -586,28 +578,23 @@ static void collapse_max_ptes_none(void) p = alloc_mapping(); fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); - if (wait_for_scan("Do not collapse with max_ptes_none exceeded", p)) - fail("Timeout"); - else if (check_huge(p)) - fail("Fail"); - else - success("OK"); + context->collapse("Maybe collapse with max_ptes_none exceeded", p, + !context->enforce_pte_scan_limits); validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); - fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); - if (wait_for_scan("Collapse with max_ptes_none PTEs empty", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); - validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); + if (context->enforce_pte_scan_limits) { + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); + context->collapse("Collapse with max_ptes_none PTEs empty", p, + true); + validate_memory(p, 0, + (hpage_pmd_nr - max_ptes_none) * page_size); + } munmap(p, hpage_pmd_size); write_settings(&default_settings); } -static void collapse_swapin_single_pte(void) +static void collapse_swapin_single_pte(struct collapse_context *context) { void *p; p = alloc_mapping(); @@ -625,18 +612,14 @@ static void collapse_swapin_single_pte(void) goto out; } - if (wait_for_scan("Collapse with swapping in single PTE entry", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse with swapping in single PTE entry", + p, true); validate_memory(p, 0, hpage_pmd_size); out: munmap(p, hpage_pmd_size); } -static void collapse_max_ptes_swap(void) +static void collapse_max_ptes_swap(struct collapse_context *context) { int max_ptes_swap = read_num("khugepaged/max_ptes_swap"); void *p; @@ -656,39 +639,34 @@ static void collapse_max_ptes_swap(void) goto out; } - if (wait_for_scan("Do not collapse with max_ptes_swap exceeded", p)) - fail("Timeout"); - else if (check_huge(p)) - fail("Fail"); - else - success("OK"); + context->collapse("Maybe collapse with max_ptes_swap exceeded", + p, !context->enforce_pte_scan_limits); validate_memory(p, 0, hpage_pmd_size); - fill_memory(p, 0, hpage_pmd_size); - printf("Swapout %d of %d pages...", max_ptes_swap, hpage_pmd_nr); - if (madvise(p, max_ptes_swap * page_size, MADV_PAGEOUT)) { - perror("madvise(MADV_PAGEOUT)"); - exit(EXIT_FAILURE); - } - if (check_swap(p, max_ptes_swap * page_size)) { - success("OK"); - } else { - fail("Fail"); - goto out; - } + if (context->enforce_pte_scan_limits) { + fill_memory(p, 0, hpage_pmd_size); + printf("Swapout %d of %d pages...", max_ptes_swap, + hpage_pmd_nr); + if (madvise(p, max_ptes_swap * page_size, MADV_PAGEOUT)) { + perror("madvise(MADV_PAGEOUT)"); + exit(EXIT_FAILURE); + } + if (check_swap(p, max_ptes_swap * page_size)) { + success("OK"); + } else { + fail("Fail"); + goto out; + } - if (wait_for_scan("Collapse with max_ptes_swap pages swapped out", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); - validate_memory(p, 0, hpage_pmd_size); + context->collapse("Collapse with max_ptes_swap pages swapped out", + p, true); + validate_memory(p, 0, hpage_pmd_size); + } out: munmap(p, hpage_pmd_size); } -static void collapse_single_pte_entry_compound(void) +static void collapse_single_pte_entry_compound(struct collapse_context *context) { void *p; @@ -710,17 +688,13 @@ static void collapse_single_pte_entry_compound(void) else fail("Fail"); - if (wait_for_scan("Collapse PTE table with single PTE mapping compound page", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table with single PTE mapping compound page", + p, true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); } -static void collapse_full_of_compound(void) +static void collapse_full_of_compound(struct collapse_context *context) { void *p; @@ -742,17 +716,12 @@ static void collapse_full_of_compound(void) else fail("Fail"); - if (wait_for_scan("Collapse PTE table full of compound pages", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table full of compound pages", p, true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } -static void collapse_compound_extreme(void) +static void collapse_compound_extreme(struct collapse_context *context) { void *p; int i; @@ -798,18 +767,14 @@ static void collapse_compound_extreme(void) else fail("Fail"); - if (wait_for_scan("Collapse PTE table full of different compound pages", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table full of different compound pages", + p, true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } -static void collapse_fork(void) +static void collapse_fork(struct collapse_context *context) { int wstatus; void *p; @@ -835,13 +800,8 @@ static void collapse_fork(void) fail("Fail"); fill_memory(p, page_size, 2 * page_size); - - if (wait_for_scan("Collapse PTE table with single page shared with parent process", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table with single page shared with parent process", + p, true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); @@ -860,7 +820,7 @@ static void collapse_fork(void) munmap(p, hpage_pmd_size); } -static void collapse_fork_compound(void) +static void collapse_fork_compound(struct collapse_context *context) { int wstatus; void *p; @@ -896,14 +856,10 @@ static void collapse_fork_compound(void) fill_memory(p, 0, page_size); write_num("khugepaged/max_ptes_shared", hpage_pmd_nr - 1); - if (wait_for_scan("Collapse PTE table full of compound pages in child", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table full of compound pages in child", + p, true); write_num("khugepaged/max_ptes_shared", - default_settings.khugepaged.max_ptes_shared); + default_settings.khugepaged.max_ptes_shared); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); @@ -922,7 +878,7 @@ static void collapse_fork_compound(void) munmap(p, hpage_pmd_size); } -static void collapse_max_ptes_shared() +static void collapse_max_ptes_shared(struct collapse_context *context) { int max_ptes_shared = read_num("khugepaged/max_ptes_shared"); int wstatus; @@ -957,28 +913,22 @@ static void collapse_max_ptes_shared() else fail("Fail"); - if (wait_for_scan("Do not collapse with max_ptes_shared exceeded", p)) - fail("Timeout"); - else if (!check_huge(p)) - success("OK"); - else - fail("Fail"); - - printf("Trigger CoW on page %d of %d...", - hpage_pmd_nr - max_ptes_shared, hpage_pmd_nr); - fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared) * page_size); - if (!check_huge(p)) - success("OK"); - else - fail("Fail"); - - - if (wait_for_scan("Collapse with max_ptes_shared PTEs shared", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Maybe collapse with max_ptes_shared exceeded", + p, !context->enforce_pte_scan_limits); + + if (context->enforce_pte_scan_limits) { + printf("Trigger CoW on page %d of %d...", + hpage_pmd_nr - max_ptes_shared, hpage_pmd_nr); + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared) * + page_size); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + context->collapse("Collapse with max_ptes_shared PTEs shared", + p, true); + } validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); @@ -997,8 +947,27 @@ static void collapse_max_ptes_shared() munmap(p, hpage_pmd_size); } +static void khugepaged_collapse(const char *msg, char *p, bool expect) +{ + if (wait_for_scan(msg, p)) + fail("Timeout"); + else if (check_huge(p) == expect) + success("OK"); + else + fail("Fail"); +} + int main(void) { + struct collapse_context contexts[] = { + { + .name = "khugepaged", + .collapse = &khugepaged_collapse, + .enforce_pte_scan_limits = true, + }, + }; + int i; + setbuf(stdout, NULL); page_size = getpagesize(); @@ -1014,18 +983,24 @@ int main(void) adjust_settings(); alloc_at_fault(); - collapse_full(); - collapse_empty(); - collapse_single_pte_entry(); - collapse_max_ptes_none(); - collapse_swapin_single_pte(); - collapse_max_ptes_swap(); - collapse_single_pte_entry_compound(); - collapse_full_of_compound(); - collapse_compound_extreme(); - collapse_fork(); - collapse_fork_compound(); - collapse_max_ptes_shared(); + + for (i = 0; i < sizeof(contexts) / sizeof(contexts[0]); ++i) { + struct collapse_context *c = &contexts[i]; + + printf("\n*** Testing context: %s ***\n", c->name); + collapse_full(c); + collapse_empty(c); + collapse_single_pte_entry(c); + collapse_max_ptes_none(c); + collapse_swapin_single_pte(c); + collapse_max_ptes_swap(c); + collapse_single_pte_entry_compound(c); + collapse_full_of_compound(c); + collapse_compound_extreme(c); + collapse_fork(c); + collapse_fork_compound(c); + collapse_max_ptes_shared(c); + } restore_settings(0); } From patchwork Tue Apr 26 14:44:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12827331 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7B29C433F5 for ; Tue, 26 Apr 2022 14:44:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D1246B0087; Tue, 26 Apr 2022 10:44:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 82FFF6B0088; Tue, 26 Apr 2022 10:44:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60E706B0089; Tue, 26 Apr 2022 10:44:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 52DF26B0087 for ; Tue, 26 Apr 2022 10:44:42 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1F15F209EA for ; Tue, 26 Apr 2022 14:44:42 +0000 (UTC) X-FDA: 79399301604.31.EB650CB Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf21.hostedemail.com (Postfix) with ESMTP id 9F98D1C0054 for ; Tue, 26 Apr 2022 14:44:38 +0000 (UTC) Received: by mail-pf1-f201.google.com with SMTP id h19-20020aa796d3000000b0050d3c025470so4253991pfq.0 for ; Tue, 26 Apr 2022 07:44:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=0RGdADma8pS7ERRUn5At99NiiXoCHSYygbxuG/hVLtE=; b=KYR+hxN+Ld3kCoX+G2HZiHyKKG7nuuzddLPth+nGoNbtxOGNQc8zEJrT0Kv0bSZJzo bOcbfr7EXScSA10i/NPGQJzhrfJLNCnshlGzJtSdQoQHxTDSRJ68w6d8BhTFOvu36qJr mnFL3oTfeoJLQucmbTkhYykfMWkggXFEhj8OwIb3+gy5hi0fTeTNbg9tHQGfO+NziA50 kajPRXSLrmrl+OU0+iSnCxHw6Q5RrhaLUWp3GsERYG2yBkpPnERnSOakDn5zVs7Pvdv3 7dmmrjXOEPxnkpciZXYM5xtN8dfAPMsZBVHkgdYBbnkf9i8bbcRi7BVDToiqMWbqiNtE OCfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=0RGdADma8pS7ERRUn5At99NiiXoCHSYygbxuG/hVLtE=; b=uXCCm4b0Ol1jnDHdUZFN14IS5BpYDnbu5VDLm/he87PnrfACIVmkDBxG67NoQZ6OpS nY1LUkpzffmghmYTyQXC1SZMmQTDuvH8q6f/xKpqZN2WIdo761z14xh3TDgFUtpwISuv WiwX4Fousbv+GD/Dqa5/esCNcDXjGMpPKiEuzir5Ucl5UtX+/uhcTArEfjS8aBHa15rn 2/C/+73oO2bobB93KtO2Yam6/ZVy1xBNjfXJ+/VNTd1MfmM+fjPwG345JN+ke5+AcN82 MVuv6YzJ/r88oWldC2vc99H1PFla9TGwSk3/PR7QOqczaRBk40MgVHS9RNr7VaQjQKYx hW3w== X-Gm-Message-State: AOAM532yrA7gTyjgqkfTEZLbwxmLVgbG8gtYPN8Kvw4vTl0EO3uGDcJC Y3H8YbooTjZafPIJtD9cnORHQd8rtNNz X-Google-Smtp-Source: ABdhPJzU9FB/2HzoXrgqhqW5JY4T5dNYsKwZa1O/134KXx07d2RnqircsGZS3KW3Kkvoe2rVEC9XyYVtblID X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:4f90:b0:1da:677:cc09 with SMTP id qe16-20020a17090b4f9000b001da0677cc09mr895164pjb.114.1650984280446; Tue, 26 Apr 2022 07:44:40 -0700 (PDT) Date: Tue, 26 Apr 2022 07:44:11 -0700 In-Reply-To: <20220426144412.742113-1-zokeefe@google.com> Message-Id: <20220426144412.742113-12-zokeefe@google.com> Mime-Version: 1.0 References: <20220426144412.742113-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v3 11/12] selftests/vm: add MADV_COLLAPSE collapse context to selftests From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 9F98D1C0054 X-Stat-Signature: cyok7ry1ntrpfm4jm18ajowadimgdiny Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=KYR+hxN+; spf=pass (imf21.hostedemail.com: domain of 3WAVoYgcKCB0SHD77879HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3WAVoYgcKCB0SHD77879HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1650984278-785208 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add MADV_COLLAPSE selftests. Extend struct collapse_context to support context initialization/cleanup. This is used by madvise collapse context to "disable" and "enable" khugepaged, since it would otherwise interfere with the tests. The mechanism used to "disable" khugepaged is a hack: it sets /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs to a large value and feeds khugepaged enough suitable VMAs/pages to keep khugepaged sleeping for the duration of the madvise collapse tests. Since khugepaged is woken when this file is written, enough VMAs must be queued to put khugepaged back to sleep when the tests write to this file in write_settings(). Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 133 ++++++++++++++++++++++-- 1 file changed, 125 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index c59d832fee96..e0ccc9443f78 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -14,17 +14,23 @@ #ifndef MADV_PAGEOUT #define MADV_PAGEOUT 21 #endif +#ifndef MADV_COLLAPSE +#define MADV_COLLAPSE 25 +#endif #define BASE_ADDR ((void *)(1UL << 30)) static unsigned long hpage_pmd_size; static unsigned long page_size; static int hpage_pmd_nr; +static int num_khugepaged_wakeups; #define THP_SYSFS "/sys/kernel/mm/transparent_hugepage/" #define PID_SMAPS "/proc/self/smaps" struct collapse_context { const char *name; + bool (*init_context)(void); + bool (*cleanup_context)(void); void (*collapse)(const char *msg, char *p, bool expect); bool enforce_pte_scan_limits; }; @@ -264,6 +270,17 @@ static void write_num(const char *name, unsigned long num) } } +/* + * Use this macro instead of write_settings inside tests, and should + * be called at most once per callsite. + * + * Hack to statically count the number of times khugepaged is woken up due to + * writes to + * /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs, + * and is stored in __COUNTER__. + */ +#define WRITE_SETTINGS(s) do { __COUNTER__; write_settings(s); } while (0) + static void write_settings(struct settings *settings) { struct khugepaged_settings *khugepaged = &settings->khugepaged; @@ -332,7 +349,7 @@ static void adjust_settings(void) { printf("Adjust settings..."); - write_settings(&default_settings); + WRITE_SETTINGS(&default_settings); success("OK"); } @@ -440,20 +457,25 @@ static bool check_swap(void *addr, unsigned long size) return swap; } -static void *alloc_mapping(void) +static void *alloc_mapping_at(void *at, size_t size) { void *p; - p = mmap(BASE_ADDR, hpage_pmd_size, PROT_READ | PROT_WRITE, - MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); - if (p != BASE_ADDR) { - printf("Failed to allocate VMA at %p\n", BASE_ADDR); + p = mmap(at, size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, + -1, 0); + if (p != at) { + printf("Failed to allocate VMA at %p\n", at); exit(EXIT_FAILURE); } return p; } +static void *alloc_mapping(void) +{ + return alloc_mapping_at(BASE_ADDR, hpage_pmd_size); +} + static void fill_memory(int *p, unsigned long start, unsigned long end) { int i; @@ -573,7 +595,7 @@ static void collapse_max_ptes_none(struct collapse_context *context) void *p; settings.khugepaged.max_ptes_none = max_ptes_none; - write_settings(&settings); + WRITE_SETTINGS(&settings); p = alloc_mapping(); @@ -591,7 +613,7 @@ static void collapse_max_ptes_none(struct collapse_context *context) } munmap(p, hpage_pmd_size); - write_settings(&default_settings); + WRITE_SETTINGS(&default_settings); } static void collapse_swapin_single_pte(struct collapse_context *context) @@ -947,6 +969,87 @@ static void collapse_max_ptes_shared(struct collapse_context *context) munmap(p, hpage_pmd_size); } +static void madvise_collapse(const char *msg, char *p, bool expect) +{ + int ret; + + printf("%s...", msg); + /* Sanity check */ + if (check_huge(p)) { + printf("Unexpected huge page\n"); + exit(EXIT_FAILURE); + } + + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + ret = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + if (((bool)ret) == expect) + fail("Fail: Bad return value"); + else if (check_huge(p) != expect) + fail("Fail: check_huge()"); + else + success("OK"); +} + +static struct khugepaged_disable_state { + void *p; + size_t map_size; +} khugepaged_disable_state; + +static bool disable_khugepaged(void) +{ + /* + * Hack to "disable" khugepaged by setting + * /transparent_hugepage/khugepaged/scan_sleep_millisecs to some large + * value, then feeding it enough suitable VMAs to scan and subsequently + * sleep. + * + * khugepaged is woken up on writes to + * /transparent_hugepage/khugepaged/scan_sleep_millisecs, so care must + * be taken to not inadvertently wake khugepaged in these tests. + * + * Feed khugepaged 1 hugepage-sized VMA to scan and sleep on, then + * N more for each time khugepaged would be woken up. + */ + size_t map_size = (num_khugepaged_wakeups + 1) * hpage_pmd_size; + void *p; + bool ret = true; + int full_scans; + int timeout = 6; /* 3 seconds */ + + default_settings.khugepaged.scan_sleep_millisecs = 1000 * 60 * 10; + default_settings.khugepaged.pages_to_scan = 1; + write_settings(&default_settings); + + p = alloc_mapping_at(((char *)BASE_ADDR) + (1UL << 30), map_size); + fill_memory(p, 0, map_size); + + full_scans = read_num("khugepaged/full_scans") + 2; + + printf("disabling khugepaged..."); + while (timeout--) { + if (read_num("khugepaged/full_scans") >= full_scans) { + fail("Fail"); + ret = false; + break; + } + printf("."); + usleep(TICK); + } + success("OK"); + khugepaged_disable_state.p = p; + khugepaged_disable_state.map_size = map_size; + return ret; +} + +static bool enable_khugepaged(void) +{ + printf("enabling khugepaged..."); + munmap(khugepaged_disable_state.p, khugepaged_disable_state.map_size); + write_settings(&saved_settings); + success("OK"); + return true; +} + static void khugepaged_collapse(const char *msg, char *p, bool expect) { if (wait_for_scan(msg, p)) @@ -962,9 +1065,18 @@ int main(void) struct collapse_context contexts[] = { { .name = "khugepaged", + .init_context = NULL, + .cleanup_context = NULL, .collapse = &khugepaged_collapse, .enforce_pte_scan_limits = true, }, + { + .name = "madvise", + .init_context = &disable_khugepaged, + .cleanup_context = &enable_khugepaged, + .collapse = &madvise_collapse, + .enforce_pte_scan_limits = false, + }, }; int i; @@ -973,6 +1085,7 @@ int main(void) page_size = getpagesize(); hpage_pmd_size = read_num("hpage_pmd_size"); hpage_pmd_nr = hpage_pmd_size / page_size; + num_khugepaged_wakeups = __COUNTER__; default_settings.khugepaged.max_ptes_none = hpage_pmd_nr - 1; default_settings.khugepaged.max_ptes_swap = hpage_pmd_nr / 8; @@ -988,6 +1101,8 @@ int main(void) struct collapse_context *c = &contexts[i]; printf("\n*** Testing context: %s ***\n", c->name); + if (c->init_context && !c->init_context()) + continue; collapse_full(c); collapse_empty(c); collapse_single_pte_entry(c); @@ -1000,6 +1115,8 @@ int main(void) collapse_fork(c); collapse_fork_compound(c); collapse_max_ptes_shared(c); + if (c->cleanup_context && !c->cleanup_context()) + break; } restore_settings(0); From patchwork Tue Apr 26 14:44:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12827332 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C27FDC433FE for ; Tue, 26 Apr 2022 14:44:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 882516B0088; Tue, 26 Apr 2022 10:44:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 830776B0089; Tue, 26 Apr 2022 10:44:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A8CD6B008A; Tue, 26 Apr 2022 10:44:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 5D0786B0088 for ; Tue, 26 Apr 2022 10:44:44 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 387E560B66 for ; Tue, 26 Apr 2022 14:44:44 +0000 (UTC) X-FDA: 79399301688.04.1A80A03 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf05.hostedemail.com (Postfix) with ESMTP id 66DF110004D for ; Tue, 26 Apr 2022 14:44:37 +0000 (UTC) Received: by mail-pg1-f202.google.com with SMTP id bj12-20020a056a02018c00b003a9eebaad34so10979357pgb.10 for ; Tue, 26 Apr 2022 07:44:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=LAOtWTUOjibEYKJGTookHxRPRUx2k2aCGFDTN9t6IEM=; b=ZV+LROzrj5YMGO/m+Stfrf7RzVl+FVzUYE+3ij/We2j8GgRKIaV7VJsmMnHChtQXGH V3QGfqvsnK2lSuq5nMh7jVj3irpYnyCPiOFZYshU86P/YxIE5IdWPq0ODtz+NsOlbUp7 gMg6/f/1IN04B0aqmdkN8g/L90k/KCyfnOXC8e0R+PoAyVuC+c33IYkn6xDM5gOKsgug pySO2f1YfSjGxpCSr8w00hB7DVYRBjStQhCCpU07Ic2lmedVFrLTpze1Urm0rTAQov6C llMxWjWsfN/2WapiqmrWax14UppZhZPuqy4fJUscGYoQExtoT37LCyM5HoCUSmLcIIt8 b5xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=LAOtWTUOjibEYKJGTookHxRPRUx2k2aCGFDTN9t6IEM=; b=Oh7zwSTkRRY2c5mzxOhK/vQDdVfaw70EfjeCplFlYsjABKniMAKBb+ATe58Wq6pa70 KwJOX/tryJBmWBatAHKxt546uPsE4WQGGnRJOirybhM7rgk9ty3p6yr+lchj8b+DEUFj omoJS8ZtCZJhTjaANj7J6LulIawo9dyLkd8QvytvG6Vx+NPomOY/uT5pZLFFibmyjONe FB/sChdPkzgyiiNQx3bXfW1jhkNayRTThCFIWaP1UB5VA1QyUEaHP8iM/iwoVUQTu7lR CcnntjI8c1mzxXFoRhoOqo6Uj3NbaNZ7GCcRtQfz+3Yt3HkM5V9YHp76zpeUkpDlIJya 109g== X-Gm-Message-State: AOAM531xD/AI8Za31n6Aynys8rBLPPOZRGXQXJrY0NMf1xbtTSbHv1K9 68nbhOx5Ce/OGXwaakaaqzoR030+Knmc X-Google-Smtp-Source: ABdhPJx6v1U9D7lGFryHGE+cHFVM8rQP5aJ7X3bdJh/0RUAhHv6GDcud5QFIygOwcT3I4yVCui8H4oyJtd7B X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:1895:b0:50a:de86:b4b0 with SMTP id x21-20020a056a00189500b0050ade86b4b0mr24802185pfh.28.1650984282754; Tue, 26 Apr 2022 07:44:42 -0700 (PDT) Date: Tue, 26 Apr 2022 07:44:12 -0700 In-Reply-To: <20220426144412.742113-1-zokeefe@google.com> Message-Id: <20220426144412.742113-13-zokeefe@google.com> Mime-Version: 1.0 References: <20220426144412.742113-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v3 12/12] selftests/vm: add test to verify recollapse of THPs From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ZV+LROzr; spf=pass (imf05.hostedemail.com: domain of 3WgVoYgcKCB8UJF99A9BJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--zokeefe.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3WgVoYgcKCB8UJF99A9BJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 66DF110004D X-Stat-Signature: ffxziddmt7s6schnq3qg3u4qjmcyoqxn X-HE-Tag: 1650984277-855769 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add selftest specific to madvise collapse context that tests MADV_COLLAPSE is "successful" if a hugepage-algined/sized region is already pmd-mapped. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 32 +++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index e0ccc9443f78..c36d04218083 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -969,6 +969,32 @@ static void collapse_max_ptes_shared(struct collapse_context *context) munmap(p, hpage_pmd_size); } +static void madvise_collapse_existing_thps(void) +{ + void *p; + int err; + + p = alloc_mapping(); + fill_memory(p, 0, hpage_pmd_size); + + printf("Collapse fully populated PTE table..."); + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + err = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + if (err == 0 && check_huge(p)) { + success("OK"); + printf("Re-collapse PMD-mapped hugepage"); + err = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + if (err == 0 && check_huge(p)) + success("OK"); + else + fail("Fail"); + } else { + fail("Fail"); + } + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); +} + static void madvise_collapse(const char *msg, char *p, bool expect) { int ret; @@ -1097,6 +1123,7 @@ int main(void) alloc_at_fault(); + /* Shared tests */ for (i = 0; i < sizeof(contexts) / sizeof(contexts[0]); ++i) { struct collapse_context *c = &contexts[i]; @@ -1119,5 +1146,10 @@ int main(void) break; } + /* madvise-specific tests */ + disable_khugepaged(); + madvise_collapse_existing_thps(); + enable_khugepaged(); + restore_settings(0); }