From patchwork Mon May 2 18:17:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12834605 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80A17C433F5 for ; Mon, 2 May 2022 18:17:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 17C986B0073; Mon, 2 May 2022 14:17:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1048A6B0074; Mon, 2 May 2022 14:17:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4AF56B0075; Mon, 2 May 2022 14:17:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id D0E706B0073 for ; Mon, 2 May 2022 14:17:29 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 9C65D2A722 for ; Mon, 2 May 2022 18:17:29 +0000 (UTC) X-FDA: 79421610618.05.61226C7 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf24.hostedemail.com (Postfix) with ESMTP id EE65B180099 for ; Mon, 2 May 2022 18:17:23 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id x16-20020aa793b0000000b0050d3d5c4f4eso8395097pff.6 for ; Mon, 02 May 2022 11:17:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=feb3hdgUhXYlLWlaSi+Zt7sm/tCIhpJou4kzEHLcVAI=; b=hgEnS/aHUQsEncWhEF3eT6L8okSLHnQk4O1WQbAlnF7OshsG9ltjoTkTIgL5o7Q/mC kapHbxcdrvQGyxVBx86Y3bKgpMi1sZz5pkgEhPQMIiwY2TY/BIEjJM9Sl0nFICH10XIw goW6AtWFFYzNqFHk+fm7NihZqp8KzJaQbGItM2Msp2VGlG3/v5xfzOZeIPYKg9BQhUr0 +dklnNu5dS61857SPkeTw2KM701iK1ZPFhgqW57YwhrObRDvydj/+5telry70mhgbhgf xgyNoR+3AjyOQ5kLZZ0bHfftY0XfT3+IlSNT3xinOQ1mz7ysY0XRKkBSKrjDm0Dxnq1d NoQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=feb3hdgUhXYlLWlaSi+Zt7sm/tCIhpJou4kzEHLcVAI=; b=74Zw0vL4aPtSRfKxAywGZRDzdo5QWoyIFpH3bks+SuZcH5kS6au1uCu5aSTjWQ4UYI dSCv+XWtqd6ONrfcYRE/jJALB/I3/Rh1Uy6KsiGy50OCA4F2lMbSNsAFdM7nQsSZG8PL o571cQ3+o6Jae51LqEeMPMxD3oUbHH15CagvvMkUjFFlndMavf5/waZX8PoahrvZHy9X UahlP6c31YcnIC2N5jL8zw9Q7hgGb+EaLloS9b0NDleDiC18FlMYdipzhKzaPL6MNWhM 2rdKacMinK3CpZ5hy9QWFYyCLyz06DqExteaf7/Bm9xqnBf1DFqgCse/80KGij+wWHfL /tKA== X-Gm-Message-State: AOAM533fU9j19/4cqrR2TMVPbpsE1GyL0K82RzJV5y3/rZyV5nFFtRuh BjzxJnsEWqXyPHFN88NDs1UaMmlbyVSA X-Google-Smtp-Source: ABdhPJxeuEFT8nyEOb+n5S5K0ziCrDPYexiA2edznGGjSf8eX+3X622yGMYKvf39CP+4Eo9Ib6wmZST/qDJg X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:3b4c:b0:1da:2001:a222 with SMTP id ot12-20020a17090b3b4c00b001da2001a222mr469243pjb.84.1651515448148; Mon, 02 May 2022 11:17:28 -0700 (PDT) Date: Mon, 2 May 2022 11:17:02 -0700 In-Reply-To: <20220502181714.3483177-1-zokeefe@google.com> Message-Id: <20220502181714.3483177-2-zokeefe@google.com> Mime-Version: 1.0 References: <20220502181714.3483177-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v4 01/13] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds THP From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: kzwybyhhooz5go94enin3tk456b79hba Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="hgEnS/aH"; spf=pass (imf24.hostedemail.com: domain of 3OCBwYgcKCFMK95zz0z19916z.x97638FI-775Gvx5.9C1@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3OCBwYgcKCFMK95zz0z19916z.x97638FI-775Gvx5.9C1@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: EE65B180099 X-HE-Tag: 1651515443-313868 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When scanning an anon pmd to see if it's eligible for collapse, return SCAN_PMD_MAPPED if the pmd already maps a THP. Note that SCAN_PMD_MAPPED is different from SCAN_PAGE_COMPOUND used in the file-collapse path, since the latter might identify pte-mapped compound pages. This is required by MADV_COLLAPSE which necessarily needs to know what hugepage-aligned/sized regions are already pmd-mapped. Signed-off-by: Zach O'Keefe --- include/trace/events/huge_memory.h | 1 + mm/internal.h | 1 + mm/khugepaged.c | 30 ++++++++++++++++++++++++++---- mm/rmap.c | 15 +++++++++++++-- 4 files changed, 41 insertions(+), 6 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index d651f3437367..55392bf30a03 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -11,6 +11,7 @@ EM( SCAN_FAIL, "failed") \ EM( SCAN_SUCCEED, "succeeded") \ EM( SCAN_PMD_NULL, "pmd_null") \ + EM( SCAN_PMD_MAPPED, "page_pmd_mapped") \ EM( SCAN_EXCEED_NONE_PTE, "exceed_none_pte") \ EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ diff --git a/mm/internal.h b/mm/internal.h index 0667abd57634..51ae9f71a2a3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -172,6 +172,7 @@ extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason /* * in mm/rmap.c: */ +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address); extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); /* diff --git a/mm/khugepaged.c b/mm/khugepaged.c index eb444fd45568..2c2ed6b4d96c 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -28,6 +28,7 @@ enum scan_result { SCAN_FAIL, SCAN_SUCCEED, SCAN_PMD_NULL, + SCAN_PMD_MAPPED, SCAN_EXCEED_NONE_PTE, SCAN_EXCEED_SWAP_PTE, SCAN_EXCEED_SHARED_PTE, @@ -977,6 +978,29 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, return 0; } +static int find_pmd_or_thp_or_none(struct mm_struct *mm, + unsigned long address, + pmd_t **pmd) +{ + pmd_t pmde; + + *pmd = mm_find_pmd_raw(mm, address); + if (!*pmd) + return SCAN_PMD_NULL; + + pmde = pmd_read_atomic(*pmd); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ + barrier(); +#endif + if (!pmd_present(pmde)) + return SCAN_PMD_NULL; + if (pmd_trans_huge(pmde)) + return SCAN_PMD_MAPPED; + return SCAN_SUCCEED; +} + /* * Bring missing pages in from swap, to complete THP collapse. * Only done if khugepaged_scan_pmd believes it is worthwhile. @@ -1228,11 +1252,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, VM_BUG_ON(address & ~HPAGE_PMD_MASK); - pmd = mm_find_pmd(mm, address); - if (!pmd) { - result = SCAN_PMD_NULL; + result = find_pmd_or_thp_or_none(mm, address, &pmd); + if (result != SCAN_SUCCEED) goto out; - } memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); pte = pte_offset_map_lock(mm, pmd, address, &ptl); diff --git a/mm/rmap.c b/mm/rmap.c index 94d6b24a1ac2..6980b4011bf8 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -759,13 +759,12 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) return vma_address(page, vma); } -pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address) { pgd_t *pgd; p4d_t *p4d; pud_t *pud; pmd_t *pmd = NULL; - pmd_t pmde; pgd = pgd_offset(mm, address); if (!pgd_present(*pgd)) @@ -780,6 +779,18 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) goto out; pmd = pmd_offset(pud, address); +out: + return pmd; +} + +pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) +{ + pmd_t pmde; + pmd_t *pmd; + + pmd = mm_find_pmd_raw(mm, address); + if (!pmd) + goto out; /* * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at() * without holding anon_vma lock for write. So when looking for a From patchwork Mon May 2 18:17:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12834606 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAF7BC433EF for ; Mon, 2 May 2022 18:17:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 527BC6B0074; Mon, 2 May 2022 14:17:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4608F6B0075; Mon, 2 May 2022 14:17:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B2026B0078; Mon, 2 May 2022 14:17:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 174E46B0074 for ; Mon, 2 May 2022 14:17:32 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id DE1532AB10 for ; Mon, 2 May 2022 18:17:31 +0000 (UTC) X-FDA: 79421610702.29.53F5E99 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf28.hostedemail.com (Postfix) with ESMTP id 11F91C0074 for ; Mon, 2 May 2022 18:17:18 +0000 (UTC) Received: by mail-pl1-f202.google.com with SMTP id v8-20020a170902b7c800b0015e927ee201so1760748plz.12 for ; Mon, 02 May 2022 11:17:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=HeAL7KilaGMKJXcbHFe+RYUA/PlL2hPRKHBnsUhWkA8=; b=bB+/G5Bab8nN5UxkWV9Ezw6t/f64qpj+SVZONzXjRh+VnucqY/MHRROvFmXpPciz9P JWIs4jkfQ3JRMZltTBhOQA1d+5ziM9YW+RPO4tnoZnNRUGtAO++0ToScdu7FjyvIFoCP ZGvarLQsItwNJqzJKC3EmzC5qXEEa7+QznBVDV9HSf4ocHzdUHNtD3bnbaYvDRgwsYWj rlCWT0a5nL3+it8ycw/SZZjQb81tkPGKYxJMmqYuWxBk+nCfeNmtAji0fY8Ueb7QFN1D 9K9PL1Nn5tRLxy1d9XwCOVDMdlR8maiKY8fyNCpuBgkfiTACCKeebOJK/fYk3JuiDYCm F1CQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=HeAL7KilaGMKJXcbHFe+RYUA/PlL2hPRKHBnsUhWkA8=; b=3q+DNXvGrzLjJiHhn7wU7GcA9xmHSFshaTh836fMCkML0nlJnxKlQnyEozRqxwuyGm eJ1cMmhg1jNoNE/G+pIF0puITg9XRV9OtCrDFO/heGyPXjhtqzEfvg7bCMr1Uq3vKUJ5 sgto1G7IbwidRFy1HSYKAF2NwHlYp0eMqzRouzUBlHFT5WJTbCln+uSz5j+i5bQE1MGL /xL9YepcxzMBk86Q8QDyBwIrPsGzyykJN0PWQgTN+442LkB+Vk+9tte+3/F2fA2rW9a9 /A4Pby3NoKDKUVQ4JFsaQHz66hCjQ6vrXkbjXwATVAftlv4L6Rs9rNyEhob895PuDUH8 /zjw== X-Gm-Message-State: AOAM531ra7/b9PXmUUVWOp8rJ2XT0Ee38GCb57b8ZBS66Pfy0CXOPdBn 4sn10xjcFgK8f+eZt1C4f5R7u3XONvxL X-Google-Smtp-Source: ABdhPJxcNtBFbniZTGkohtsKjKZbMjPPlScaaTdXig3D390eVvDuox3rsz2MM7bbXoahX+wrME1GAiDdYt0B X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:903:124b:b0:15e:84d0:ded6 with SMTP id u11-20020a170903124b00b0015e84d0ded6mr12860459plh.141.1651515450292; Mon, 02 May 2022 11:17:30 -0700 (PDT) Date: Mon, 2 May 2022 11:17:03 -0700 In-Reply-To: <20220502181714.3483177-1-zokeefe@google.com> Message-Id: <20220502181714.3483177-3-zokeefe@google.com> Mime-Version: 1.0 References: <20220502181714.3483177-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v4 02/13] mm/khugepaged: add struct collapse_control From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: h51dt3f7rnze1a7haea99bq6g46s8tu9 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 11F91C0074 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="bB+/G5Ba"; spf=pass (imf28.hostedemail.com: domain of 3OiBwYgcKCFUMB711213BB381.zB985AHK-997Ixz7.BE3@flex--zokeefe.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3OiBwYgcKCFUMB711213BB381.zB985AHK-997Ixz7.BE3@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1651515438-539184 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Modularize hugepage collapse by introducing struct collapse_control. This structure serves to describe the properties of the requested collapse, as well as serve as a local scratch pad to use during the collapse itself. Start by moving global per-node khugepaged statistics into this new structure, and stack allocate one for khugepaged collapse context. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 78 ++++++++++++++++++++++++++++--------------------- 1 file changed, 45 insertions(+), 33 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 2c2ed6b4d96c..59357e34e7ce 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -86,6 +86,14 @@ static struct kmem_cache *mm_slot_cache __read_mostly; #define MAX_PTE_MAPPED_THP 8 +struct collapse_control { + /* Num pages scanned per node */ + int node_load[MAX_NUMNODES]; + + /* Last target selected in khugepaged_find_target_node() */ + int last_target_node; +}; + /** * struct mm_slot - hash lookup from mm to mm_slot * @hash: hash collision list @@ -786,9 +794,7 @@ static void khugepaged_alloc_sleep(void) remove_wait_queue(&khugepaged_wait, &wait); } -static int khugepaged_node_load[MAX_NUMNODES]; - -static bool khugepaged_scan_abort(int nid) +static bool khugepaged_scan_abort(int nid, struct collapse_control *cc) { int i; @@ -800,11 +806,11 @@ static bool khugepaged_scan_abort(int nid) return false; /* If there is a count for this node already, it must be acceptable */ - if (khugepaged_node_load[nid]) + if (cc->node_load[nid]) return false; for (i = 0; i < MAX_NUMNODES; i++) { - if (!khugepaged_node_load[i]) + if (!cc->node_load[i]) continue; if (node_distance(nid, i) > node_reclaim_distance) return true; @@ -819,28 +825,27 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void) } #ifdef CONFIG_NUMA -static int khugepaged_find_target_node(void) +static int khugepaged_find_target_node(struct collapse_control *cc) { - static int last_khugepaged_target_node = NUMA_NO_NODE; int nid, target_node = 0, max_value = 0; /* find first node with max normal pages hit */ for (nid = 0; nid < MAX_NUMNODES; nid++) - if (khugepaged_node_load[nid] > max_value) { - max_value = khugepaged_node_load[nid]; + if (cc->node_load[nid] > max_value) { + max_value = cc->node_load[nid]; target_node = nid; } /* do some balance if several nodes have the same hit record */ - if (target_node <= last_khugepaged_target_node) - for (nid = last_khugepaged_target_node + 1; nid < MAX_NUMNODES; - nid++) - if (max_value == khugepaged_node_load[nid]) { + if (target_node <= cc->last_target_node) + for (nid = cc->last_target_node + 1; nid < MAX_NUMNODES; + nid++) + if (max_value == cc->node_load[nid]) { target_node = nid; break; } - last_khugepaged_target_node = target_node; + cc->last_target_node = target_node; return target_node; } @@ -878,7 +883,7 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) return *hpage; } #else -static int khugepaged_find_target_node(void) +static int khugepaged_find_target_node(struct collapse_control *cc) { return 0; } @@ -1238,7 +1243,8 @@ static void collapse_huge_page(struct mm_struct *mm, static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, - struct page **hpage) + struct page **hpage, + struct collapse_control *cc) { pmd_t *pmd; pte_t *pte, *_pte; @@ -1256,7 +1262,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, if (result != SCAN_SUCCEED) goto out; - memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); + memset(cc->node_load, 0, sizeof(cc->node_load)); pte = pte_offset_map_lock(mm, pmd, address, &ptl); for (_address = address, _pte = pte; _pte < pte+HPAGE_PMD_NR; _pte++, _address += PAGE_SIZE) { @@ -1322,16 +1328,16 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, /* * Record which node the original page is from and save this - * information to khugepaged_node_load[]. + * information to cc->node_load[]. * Khugepaged will allocate hugepage from the node has the max * hit record. */ node = page_to_nid(page); - if (khugepaged_scan_abort(node)) { + if (khugepaged_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; goto out_unmap; } - khugepaged_node_load[node]++; + cc->node_load[node]++; if (!PageLRU(page)) { result = SCAN_PAGE_LRU; goto out_unmap; @@ -1382,7 +1388,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (ret) { - node = khugepaged_find_target_node(); + node = khugepaged_find_target_node(cc); /* collapse_huge_page will return with the mmap_lock released */ collapse_huge_page(mm, address, hpage, node, referenced, unmapped); @@ -2034,7 +2040,8 @@ static void collapse_file(struct mm_struct *mm, } static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage) + struct file *file, pgoff_t start, struct page **hpage, + struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -2045,7 +2052,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, present = 0; swap = 0; - memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); + memset(cc->node_load, 0, sizeof(cc->node_load)); rcu_read_lock(); xas_for_each(&xas, page, start + HPAGE_PMD_NR - 1) { if (xas_retry(&xas, page)) @@ -2070,11 +2077,11 @@ static void khugepaged_scan_file(struct mm_struct *mm, } node = page_to_nid(page); - if (khugepaged_scan_abort(node)) { + if (khugepaged_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; break; } - khugepaged_node_load[node]++; + cc->node_load[node]++; if (!PageLRU(page)) { result = SCAN_PAGE_LRU; @@ -2107,7 +2114,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - node = khugepaged_find_target_node(); + node = khugepaged_find_target_node(cc); collapse_file(mm, file, start, hpage, node); } } @@ -2116,7 +2123,8 @@ static void khugepaged_scan_file(struct mm_struct *mm, } #else static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage) + struct file *file, pgoff_t start, struct page **hpage, + struct collapse_control *cc) { BUILD_BUG(); } @@ -2127,7 +2135,8 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, - struct page **hpage) + struct page **hpage, + struct collapse_control *cc) __releases(&khugepaged_mm_lock) __acquires(&khugepaged_mm_lock) { @@ -2203,12 +2212,12 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, mmap_read_unlock(mm); ret = 1; - khugepaged_scan_file(mm, file, pgoff, hpage); + khugepaged_scan_file(mm, file, pgoff, hpage, cc); fput(file); } else { ret = khugepaged_scan_pmd(mm, vma, khugepaged_scan.address, - hpage); + hpage, cc); } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; @@ -2264,7 +2273,7 @@ static int khugepaged_wait_event(void) kthread_should_stop(); } -static void khugepaged_do_scan(void) +static void khugepaged_do_scan(struct collapse_control *cc) { struct page *hpage = NULL; unsigned int progress = 0, pass_through_head = 0; @@ -2288,7 +2297,7 @@ static void khugepaged_do_scan(void) if (khugepaged_has_work() && pass_through_head < 2) progress += khugepaged_scan_mm_slot(pages - progress, - &hpage); + &hpage, cc); else progress = pages; spin_unlock(&khugepaged_mm_lock); @@ -2327,12 +2336,15 @@ static void khugepaged_wait_work(void) static int khugepaged(void *none) { struct mm_slot *mm_slot; + struct collapse_control cc = { + .last_target_node = NUMA_NO_NODE, + }; set_freezable(); set_user_nice(current, MAX_NICE); while (!kthread_should_stop()) { - khugepaged_do_scan(); + khugepaged_do_scan(&cc); khugepaged_wait_work(); } From patchwork Mon May 2 18:17:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12834607 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DC94C433F5 for ; Mon, 2 May 2022 18:17:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 17D786B0075; Mon, 2 May 2022 14:17:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0B8036B0078; Mon, 2 May 2022 14:17:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E997F6B007B; Mon, 2 May 2022 14:17:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id CE1046B0075 for ; Mon, 2 May 2022 14:17:33 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A9F49210E7 for ; Mon, 2 May 2022 18:17:33 +0000 (UTC) X-FDA: 79421610786.27.3354B1F Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf14.hostedemail.com (Postfix) with ESMTP id DB5B1100072 for ; Mon, 2 May 2022 18:17:31 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-2f7c5767f0fso141589637b3.4 for ; Mon, 02 May 2022 11:17:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=SLjReTtZhtJD/7TtFXA/+T84e20adxPsQb3SxmH90Eo=; b=rTIsYncqIDJwsXCLIuLEiSJdEggxhdSNzvexIvJlyRiLjZat4C0pgdCNGmqm0N6Ih7 OQNt8n3E84c+solMBasEpRvpdStlwW1PBf4E7a6LsTs8kiMFZ1JBGvRk81ZiRrsFRh92 ATlDRPp6kvybBbNJSV9pG0buadsfp40CUDHNV+mW7QO50eumtV1qaFP1FLXsPcP06RFw w4u41kW00YoAlwQabmP+buDnydVP4SHFuSS63g1/2jtKVXU4ofQjF34srZB6dqz32J+r KinfNgd/JKtZL1U00O/vl36mhwdlXycQbptfQB6rqGcK/fPUnqCEwI55I4niGhr9NAye 5E4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=SLjReTtZhtJD/7TtFXA/+T84e20adxPsQb3SxmH90Eo=; b=p5N9mfW6G2Vn24Qyb9vgQRXh+QLRvOJS6dlUVWQW7JiWBrVTdpr8lxv17YWq8r8noo dATtV8qaJCxyssfOAxcRXMjtjeyyrM1wpyQu6El+gYy+pExs1pVq0J/YVMChxBoepLdW VqU8aVfmVnpG/zx1jpaUcnaZjGzsT0dLD7dOMk6lVI+C+sqUwGamFnOyztiN3yd9QlxD yYwr9Q0Hd/H/r87skhysiQqG6tiC+MVA4H8WPQzorK+jZF5ok/mLS4egGpdCJ9lrNxKr 1VMG2qiaoVUmBuNCMUR8nVUPCnYfSt9c5q5+WqwKFKllzS6ruwKD6KWGNuE7aQ80LZYI +y/A== X-Gm-Message-State: AOAM532nOaO0efDB6TvDeM7czOXBTHxCHC49yZl77feA7P7CcNxDDTfk c3knm6YWehjV7BwOZvf2m0I+db0pRba5 X-Google-Smtp-Source: ABdhPJyAvHh6HkPeAT1uTW5Zsi5Pf5SBz9XB2UYVBN163Q6fCj3q/Eal88mm5/y5sw6isi/pUNKvO3dI6a54 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a5b:a10:0:b0:645:deb7:294a with SMTP id k16-20020a5b0a10000000b00645deb7294amr11534201ybq.177.1651515452397; Mon, 02 May 2022 11:17:32 -0700 (PDT) Date: Mon, 2 May 2022 11:17:04 -0700 In-Reply-To: <20220502181714.3483177-1-zokeefe@google.com> Message-Id: <20220502181714.3483177-4-zokeefe@google.com> Mime-Version: 1.0 References: <20220502181714.3483177-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v4 03/13] mm/khugepaged: dedup and simplify hugepage alloc and charging From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: u1brb7smywiwugdkahmo456u5mzd3uqh X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: DB5B1100072 Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=rTIsYncq; spf=pass (imf14.hostedemail.com: domain of 3PCBwYgcKCFcOD933435DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--zokeefe.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3PCBwYgcKCFcOD933435DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1651515451-419546 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The following code is duplicated in collapse_huge_page() and collapse_file(): /* Only allocate from the target node */ gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; new_page = khugepaged_alloc_page(hpage, gfp, node); if (!new_page) { result = SCAN_ALLOC_HUGE_PAGE_FAIL; goto out; } if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { result = SCAN_CGROUP_CHARGE_FAIL; goto out; } count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); Also, "node" is passed as an argument to both collapse_huge_page() and collapse_file() and obtained the same way, via khugepaged_find_target_node(). Move all this into a new helper, alloc_charge_hpage(), and remove the duplicate code from collapse_huge_page() and collapse_file(). Also, simplify khugepaged_alloc_page() by returning a bool indicating allocation success instead of a copy of the (possibly) allocated struct page. Suggested-by: Peter Xu Signed-off-by: Zach O'Keefe --- This patch currently depends on 'mm/khugepaged: sched to numa node when collapse huge page' currently being discussed upstream[1], and anticipates that this functionality would be equally applicable to file-backed collapse. It also goes ahead and wraps this code in a CONFIF_NUMA #ifdef. [1] https://lore.kernel.org/linux-mm/20220317065024.2635069-1-maobibo@loongson.cn/ mm/khugepaged.c | 99 +++++++++++++++++++++++-------------------------- 1 file changed, 46 insertions(+), 53 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 59357e34e7ce..b05fb9a85eab 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -866,8 +866,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static struct page * -khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) { VM_BUG_ON_PAGE(*hpage, *hpage); @@ -875,12 +874,12 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) if (unlikely(!*hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); *hpage = ERR_PTR(-ENOMEM); - return NULL; + return false; } prep_transhuge_page(*hpage); count_vm_event(THP_COLLAPSE_ALLOC); - return *hpage; + return true; } #else static int khugepaged_find_target_node(struct collapse_control *cc) @@ -942,12 +941,11 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static struct page * -khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) { VM_BUG_ON(!*hpage); - return *hpage; + return true; } #endif @@ -1069,10 +1067,34 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, return true; } -static void collapse_huge_page(struct mm_struct *mm, - unsigned long address, - struct page **hpage, - int node, int referenced, int unmapped) +static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, + struct collapse_control *cc) +{ +#ifdef CONFIG_NUMA + const struct cpumask *cpumask; +#endif + gfp_t gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; + int node = khugepaged_find_target_node(cc); + +#ifdef CONFIG_NUMA + /* sched to specified node before huge page memory copy */ + if (task_node(current) != node) { + cpumask = cpumask_of_node(node); + if (!cpumask_empty(cpumask)) + set_cpus_allowed_ptr(current, cpumask); + } +#endif + if (!khugepaged_alloc_page(hpage, gfp, node)) + return SCAN_ALLOC_HUGE_PAGE_FAIL; + if (unlikely(mem_cgroup_charge(page_folio(*hpage), mm, gfp))) + return SCAN_CGROUP_CHARGE_FAIL; + count_memcg_page_event(*hpage, THP_COLLAPSE_ALLOC); + return SCAN_SUCCEED; +} + +static void collapse_huge_page(struct mm_struct *mm, unsigned long address, + struct page **hpage, int referenced, + int unmapped, struct collapse_control *cc) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; @@ -1083,14 +1105,9 @@ static void collapse_huge_page(struct mm_struct *mm, int isolated = 0, result = 0; struct vm_area_struct *vma; struct mmu_notifier_range range; - gfp_t gfp; - const struct cpumask *cpumask; VM_BUG_ON(address & ~HPAGE_PMD_MASK); - /* Only allocate from the target node */ - gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; - /* * Before allocating the hugepage, release the mmap_lock read lock. * The allocation can take potentially a long time if it involves @@ -1099,23 +1116,11 @@ static void collapse_huge_page(struct mm_struct *mm, */ mmap_read_unlock(mm); - /* sched to specified node before huage page memory copy */ - if (task_node(current) != node) { - cpumask = cpumask_of_node(node); - if (!cpumask_empty(cpumask)) - set_cpus_allowed_ptr(current, cpumask); - } - new_page = khugepaged_alloc_page(hpage, gfp, node); - if (!new_page) { - result = SCAN_ALLOC_HUGE_PAGE_FAIL; + result = alloc_charge_hpage(hpage, mm, cc); + if (result != SCAN_SUCCEED) goto out_nolock; - } - if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { - result = SCAN_CGROUP_CHARGE_FAIL; - goto out_nolock; - } - count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); + new_page = *hpage; mmap_read_lock(mm); result = hugepage_vma_revalidate(mm, address, &vma); @@ -1388,10 +1393,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (ret) { - node = khugepaged_find_target_node(cc); /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, hpage, node, - referenced, unmapped); + collapse_huge_page(mm, address, hpage, referenced, unmapped, + cc); } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, @@ -1657,7 +1661,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * @file: file that collapse on * @start: collapse start address * @hpage: new allocated huge page for collapse - * @node: appointed node the new huge page allocate from + * @cc: collapse context and scratchpad * * Basic scheme is simple, details are more complex: * - allocate and lock a new huge page; @@ -1674,12 +1678,11 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * + restore gaps in the page cache; * + unlock and free huge page; */ -static void collapse_file(struct mm_struct *mm, - struct file *file, pgoff_t start, - struct page **hpage, int node) +static void collapse_file(struct mm_struct *mm, struct file *file, + pgoff_t start, struct page **hpage, + struct collapse_control *cc) { struct address_space *mapping = file->f_mapping; - gfp_t gfp; struct page *new_page; pgoff_t index, end = start + HPAGE_PMD_NR; LIST_HEAD(pagelist); @@ -1691,20 +1694,11 @@ static void collapse_file(struct mm_struct *mm, VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); - /* Only allocate from the target node */ - gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; - - new_page = khugepaged_alloc_page(hpage, gfp, node); - if (!new_page) { - result = SCAN_ALLOC_HUGE_PAGE_FAIL; + result = alloc_charge_hpage(hpage, mm, cc); + if (result != SCAN_SUCCEED) goto out; - } - if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { - result = SCAN_CGROUP_CHARGE_FAIL; - goto out; - } - count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); + new_page = *hpage; /* * Ensure we have slots for all the pages in the range. This is @@ -2114,8 +2108,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - node = khugepaged_find_target_node(cc); - collapse_file(mm, file, start, hpage, node); + collapse_file(mm, file, start, hpage, cc); } } From patchwork Mon May 2 18:17:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12834608 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3253EC433EF for ; Mon, 2 May 2022 18:17:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C2A416B0078; Mon, 2 May 2022 14:17:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B6B1A6B007B; Mon, 2 May 2022 14:17:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B8666B007D; Mon, 2 May 2022 14:17:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 8506D6B0078 for ; Mon, 2 May 2022 14:17:35 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 5C62B2A722 for ; Mon, 2 May 2022 18:17:35 +0000 (UTC) X-FDA: 79421610870.26.7C03B42 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf25.hostedemail.com (Postfix) with ESMTP id 39864A0084 for ; Mon, 2 May 2022 18:17:23 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id u18-20020a170902e21200b0015e5e660618so5477997plb.5 for ; Mon, 02 May 2022 11:17:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=t7F8zx0lTHKtnmz2meJgDJpH8LMEXALAL9De80QPgJ8=; b=pzkKIvETyGo2OGGbA6/ejNkOyP7Z+64Qx/HLjbCHew4dh+r0L/eC4B+omLFuoPGmiM u6GyWBOsZVDk2MFSV/lrUA1V81+8HPqh7PUz2hx3WKiyWQsv0qtzYavZGq1mADS02l4h Je8cjVeQ9g5iNCaMzgVCSMLDiJGEqk7PhrObtC2E1TqjN2x7JKf+jP5RiTAJLCmZuB1y y0dLTILK5/yEs4CrUkYqNXHeX1npKPsJwU5ZO7JhVzeSXw0T8JsegMjhPba31Uq6hyZ7 l7o3gy0JtK06mJEthkvb/8hho0c6zju1PwdHDl5d9JLcA6uqcc/WdDfi1xZ0AGaHLqe4 9GdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=t7F8zx0lTHKtnmz2meJgDJpH8LMEXALAL9De80QPgJ8=; b=CQsEpJWhqb7CSdyyLHY2axlkgNS3q+SZv3wO/gSy/YCAKUX7/7Nv0JHVQKd0otwTTN s2Kfl65FPHW0Ds0KfSDzDF1+uXZDHoVbManZ3BFSjnuT4c7ffdnRdwAJzVBIP/sQqQ8C MBSmlywUmMN/Fi17bA+yEp7RKWmOFw+0EqPjj9VIJgQMSMb6vPX9l4FoJ908LzIv++JI Fz4GFR9CBPhRcxhU/uyhHWvRD45BJD03roSoumTEkAIl4GZr/MDjH4yCo5VbkC0GCzwQ T5NOFaioKbzJ/9qyE5xKRb0vaKM3rkirtlQoTC9xh/0VPBbcHkRqRoTyOkg6rFITQQeG AoLw== X-Gm-Message-State: AOAM532oyLpk1oBZXAhGfH55ZkGlPDHTcpXtaT9SoNsJloRWbGw60J4t 9qjqZNpJuUnRf9nmeChHpbZOG1dLv/Db X-Google-Smtp-Source: ABdhPJw/O1cd09WN7dvLu2HlurJ3kypoDssY6bMOMGcmFL6dlGAdsw4IIr6B+jQeurWM2S5S60aUSJIBejco X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:17c9:b0:1da:4359:768b with SMTP id me9-20020a17090b17c900b001da4359768bmr496604pjb.22.1651515453962; Mon, 02 May 2022 11:17:33 -0700 (PDT) Date: Mon, 2 May 2022 11:17:05 -0700 In-Reply-To: <20220502181714.3483177-1-zokeefe@google.com> Message-Id: <20220502181714.3483177-5-zokeefe@google.com> Mime-Version: 1.0 References: <20220502181714.3483177-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v4 04/13] mm/khugepaged: make hugepage allocation context-specific From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 39864A0084 X-Stat-Signature: aeby6yat5pus11fjfpdfz874knhosk6n Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=pzkKIvET; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of 3PSBwYgcKCFgPEA44546EE6B4.2ECB8DKN-CCAL02A.EH6@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3PSBwYgcKCFgPEA44546EE6B4.2ECB8DKN-CCAL02A.EH6@flex--zokeefe.bounces.google.com X-Rspam-User: X-HE-Tag: 1651515443-530221 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a hook to struct collapse_context that allows contexts to define their own allocation semantics and charging logic. For example, khugepaged has specific NUMA and UMA implementations as well as gfp flags tied to /sys/kernel/mm/transparent_hugepage/khugepaged/defrag. Additionally, move [pre]allocated hugepage pointer into struct collapse_context. Signed-off-by: Zach O'Keefe Reported-by: kernel test robot Reported-by: kernel test robot Reported-by: kernel test robot --- mm/khugepaged.c | 85 ++++++++++++++++++++++++------------------------- 1 file changed, 42 insertions(+), 43 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b05fb9a85eab..755c40fe87d2 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -92,6 +92,10 @@ struct collapse_control { /* Last target selected in khugepaged_find_target_node() */ int last_target_node; + + struct page *hpage; + int (*alloc_charge_hpage)(struct mm_struct *mm, + struct collapse_control *cc); }; /** @@ -866,18 +870,19 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static bool khugepaged_alloc_page(gfp_t gfp, int node, + struct collapse_control *cc) { - VM_BUG_ON_PAGE(*hpage, *hpage); + VM_BUG_ON_PAGE(cc->hpage, cc->hpage); - *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); - if (unlikely(!*hpage)) { + cc->hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); + if (unlikely(!cc->hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); - *hpage = ERR_PTR(-ENOMEM); + cc->hpage = ERR_PTR(-ENOMEM); return false; } - prep_transhuge_page(*hpage); + prep_transhuge_page(cc->hpage); count_vm_event(THP_COLLAPSE_ALLOC); return true; } @@ -1067,8 +1072,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, return true; } -static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, - struct collapse_control *cc) +static int alloc_charge_hpage(struct mm_struct *mm, struct collapse_control *cc) { #ifdef CONFIG_NUMA const struct cpumask *cpumask; @@ -1084,17 +1088,17 @@ static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, set_cpus_allowed_ptr(current, cpumask); } #endif - if (!khugepaged_alloc_page(hpage, gfp, node)) + if (!khugepaged_alloc_page(gfp, node, cc)) return SCAN_ALLOC_HUGE_PAGE_FAIL; - if (unlikely(mem_cgroup_charge(page_folio(*hpage), mm, gfp))) + if (unlikely(mem_cgroup_charge(page_folio(cc->hpage), mm, gfp))) return SCAN_CGROUP_CHARGE_FAIL; - count_memcg_page_event(*hpage, THP_COLLAPSE_ALLOC); + count_memcg_page_event(cc->hpage, THP_COLLAPSE_ALLOC); return SCAN_SUCCEED; } static void collapse_huge_page(struct mm_struct *mm, unsigned long address, - struct page **hpage, int referenced, - int unmapped, struct collapse_control *cc) + int referenced, int unmapped, + struct collapse_control *cc) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; @@ -1116,11 +1120,11 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, */ mmap_read_unlock(mm); - result = alloc_charge_hpage(hpage, mm, cc); + result = cc->alloc_charge_hpage(mm, cc); if (result != SCAN_SUCCEED) goto out_nolock; - new_page = *hpage; + new_page = cc->hpage; mmap_read_lock(mm); result = hugepage_vma_revalidate(mm, address, &vma); @@ -1232,15 +1236,15 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, update_mmu_cache_pmd(vma, address, pmd); spin_unlock(pmd_ptl); - *hpage = NULL; + cc->hpage = NULL; khugepaged_pages_collapsed++; result = SCAN_SUCCEED; out_up_write: mmap_write_unlock(mm); out_nolock: - if (!IS_ERR_OR_NULL(*hpage)) - mem_cgroup_uncharge(page_folio(*hpage)); + if (!IS_ERR_OR_NULL(cc->hpage)) + mem_cgroup_uncharge(page_folio(cc->hpage)); trace_mm_collapse_huge_page(mm, isolated, result); return; } @@ -1248,7 +1252,6 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, - struct page **hpage, struct collapse_control *cc) { pmd_t *pmd; @@ -1394,8 +1397,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, pte_unmap_unlock(pte, ptl); if (ret) { /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, hpage, referenced, unmapped, - cc); + collapse_huge_page(mm, address, referenced, unmapped, cc); } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, @@ -1660,7 +1662,6 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * @mm: process address space where collapse happens * @file: file that collapse on * @start: collapse start address - * @hpage: new allocated huge page for collapse * @cc: collapse context and scratchpad * * Basic scheme is simple, details are more complex: @@ -1679,8 +1680,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * + unlock and free huge page; */ static void collapse_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct page **hpage, - struct collapse_control *cc) + pgoff_t start, struct collapse_control *cc) { struct address_space *mapping = file->f_mapping; struct page *new_page; @@ -1694,11 +1694,11 @@ static void collapse_file(struct mm_struct *mm, struct file *file, VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); - result = alloc_charge_hpage(hpage, mm, cc); + result = cc->alloc_charge_hpage(mm, cc); if (result != SCAN_SUCCEED) goto out; - new_page = *hpage; + new_page = cc->hpage; /* * Ensure we have slots for all the pages in the range. This is @@ -1981,7 +1981,7 @@ static void collapse_file(struct mm_struct *mm, struct file *file, * Remove pte page tables, so we can re-fault the page as huge. */ retract_page_tables(mapping, start); - *hpage = NULL; + cc->hpage = NULL; khugepaged_pages_collapsed++; } else { @@ -2028,14 +2028,14 @@ static void collapse_file(struct mm_struct *mm, struct file *file, unlock_page(new_page); out: VM_BUG_ON(!list_empty(&pagelist)); - if (!IS_ERR_OR_NULL(*hpage)) - mem_cgroup_uncharge(page_folio(*hpage)); + if (!IS_ERR_OR_NULL(cc->hpage)) + mem_cgroup_uncharge(page_folio(cc->hpage)); /* TODO: tracepoints */ } static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage, - struct collapse_control *cc) + struct file *file, pgoff_t start, + struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -2108,7 +2108,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - collapse_file(mm, file, start, hpage, cc); + collapse_file(mm, file, start, cc); } } @@ -2116,8 +2116,8 @@ static void khugepaged_scan_file(struct mm_struct *mm, } #else static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage, - struct collapse_control *cc) + struct file *file, pgoff_t start, + struct collapse_control *cc) { BUILD_BUG(); } @@ -2128,7 +2128,6 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, - struct page **hpage, struct collapse_control *cc) __releases(&khugepaged_mm_lock) __acquires(&khugepaged_mm_lock) @@ -2205,12 +2204,11 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, mmap_read_unlock(mm); ret = 1; - khugepaged_scan_file(mm, file, pgoff, hpage, cc); + khugepaged_scan_file(mm, file, pgoff, cc); fput(file); } else { ret = khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, - hpage, cc); + khugepaged_scan.address, cc); } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; @@ -2268,15 +2266,15 @@ static int khugepaged_wait_event(void) static void khugepaged_do_scan(struct collapse_control *cc) { - struct page *hpage = NULL; unsigned int progress = 0, pass_through_head = 0; unsigned int pages = READ_ONCE(khugepaged_pages_to_scan); bool wait = true; + cc->hpage = NULL; lru_add_drain_all(); while (progress < pages) { - if (!khugepaged_prealloc_page(&hpage, &wait)) + if (!khugepaged_prealloc_page(&cc->hpage, &wait)) break; cond_resched(); @@ -2290,14 +2288,14 @@ static void khugepaged_do_scan(struct collapse_control *cc) if (khugepaged_has_work() && pass_through_head < 2) progress += khugepaged_scan_mm_slot(pages - progress, - &hpage, cc); + cc); else progress = pages; spin_unlock(&khugepaged_mm_lock); } - if (!IS_ERR_OR_NULL(hpage)) - put_page(hpage); + if (!IS_ERR_OR_NULL(cc->hpage)) + put_page(cc->hpage); } static bool khugepaged_should_wakeup(void) @@ -2331,6 +2329,7 @@ static int khugepaged(void *none) struct mm_slot *mm_slot; struct collapse_control cc = { .last_target_node = NUMA_NO_NODE, + .alloc_charge_hpage = &alloc_charge_hpage, }; set_freezable(); From patchwork Mon May 2 18:17:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12834609 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F07E4C433FE for ; Mon, 2 May 2022 18:17:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80AE46B007B; Mon, 2 May 2022 14:17:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 795266B007D; Mon, 2 May 2022 14:17:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E6826B007E; Mon, 2 May 2022 14:17:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 4C4766B007B for ; Mon, 2 May 2022 14:17:38 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 24D47212C2 for ; Mon, 2 May 2022 18:17:38 +0000 (UTC) X-FDA: 79421610996.16.525877B Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf23.hostedemail.com (Postfix) with ESMTP id DF95014007F for ; Mon, 2 May 2022 18:17:28 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id u1-20020a17090a2b8100b001d9325a862fso6682660pjd.6 for ; Mon, 02 May 2022 11:17:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Mc0jQtxfHOk2MMk4Pq3M5BxxvHSx5u4cNJ3auov5auk=; b=VKbjOcB2dOT6nDv43YVrPaUddUJZTNq9BVbVT4vaXZItPTrT4NOkmR/8hJ6tWioWap 7LC5lU8O8+t6FLtnaIfXuLp96TlU+rTfxwr0GlvCncMalgeJt/e1qWNTU7Cr8GL2gDJM ezlKlDSTS9ypI/L4Dt6ZPvnnEU3y7QGoIMuEUr4yaKH0bubov1MkyqOwfbhFQblfkwGQ MPNtrKqTuqh5HRgDGPj0Ra+GRy6pwW9Le3mYCmHKJKAkRBxXX23UjzBKE9uF8TByxvhY TMr1MkShklHEK8bTi2wAwClcdlb1u7qSrIGz28OKo8rtLJYvXmA4w/QNTmxOJyF4/gAE krmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Mc0jQtxfHOk2MMk4Pq3M5BxxvHSx5u4cNJ3auov5auk=; b=hVhk301zuMmlgpwwaztxt9WiKHp7xygfM+NNVxHxSSDdz4xPV7ZKw2GrADNo6+oMzE FTLEBGbA9Yr57EbLZojouY0EqDPW4729PadPKKKglceLGu+eE50Oh/QlJH6wF3V1H3zk zMB2D6izSbxSKwcPYmUAK86kyrt4L4lUPuWoJUw5vIzehqCfGWexbGbQqSB/2g/t2ZQl p1q2gvWN5c6tj+yK+ItBIkLTGi3ArUoUfOVuOK5mx8VeWZ4MdrGVK76KxrHElDIMn+/F g8r0HoILRKFa92vXOAtwon1a/SgMMxPfI7jhqqu6sMbTDkkMxp3aN494iPquVPTXu/hZ kZJA== X-Gm-Message-State: AOAM5317T4k02x1OHyeFuxk9Uj8SVCBYJViMzIGQaGvR0u1g6VJ71qXX GLMq9+Fxvh0A7bn/uCqcWF9B4y4hEoXV X-Google-Smtp-Source: ABdhPJy+n64A1NiJZsFj1Qu8Gs4GxbDyC7bZUW76Un87oKbkyh2zaIp8hYq9Iznlxvy0Ka+5KBQq/5CAq6YQ X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90a:e510:b0:1d9:ee23:9fa1 with SMTP id t16-20020a17090ae51000b001d9ee239fa1mr88442pjy.0.1651515455636; Mon, 02 May 2022 11:17:35 -0700 (PDT) Date: Mon, 2 May 2022 11:17:06 -0700 In-Reply-To: <20220502181714.3483177-1-zokeefe@google.com> Message-Id: <20220502181714.3483177-6-zokeefe@google.com> Mime-Version: 1.0 References: <20220502181714.3483177-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v4 05/13] mm/khugepaged: pipe enum scan_result codes back to callers From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: th4joauyoch4z7wbs5yrokmjddbeqgpf Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=VKbjOcB2; spf=pass (imf23.hostedemail.com: domain of 3PyBwYgcKCFoRGC66768GG8D6.4GEDAFMP-EECN24C.GJ8@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3PyBwYgcKCFoRGC66768GG8D6.4GEDAFMP-EECN24C.GJ8@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: DF95014007F X-HE-Tag: 1651515448-215181 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pipe enum scan_result codes back through return values of functions downstream of khugepaged_scan_file() and khugepaged_scan_pmd() to inform callers if the operation was successful, and if not, why. Since khugepaged_scan_pmd()'s return value already has a specific meaning (whether mmap_lock was unlocked or not), add a bool* argument to khugepaged_scan_pmd() to retrieve this information. Change khugepaged to take action based on the return values of khugepaged_scan_file() and khugepaged_scan_pmd() instead of acting deep within the collapsing functions themselves. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 85 +++++++++++++++++++++++++++---------------------- 1 file changed, 47 insertions(+), 38 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 755c40fe87d2..986344a04165 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -732,13 +732,13 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_SUCCEED; trace_mm_collapse_huge_page_isolate(page, none_or_zero, referenced, writable, result); - return 1; + return SCAN_SUCCEED; } out: release_pte_pages(pte, _pte, compound_pagelist); trace_mm_collapse_huge_page_isolate(page, none_or_zero, referenced, writable, result); - return 0; + return result; } static void __collapse_huge_page_copy(pte_t *pte, struct page *page, @@ -1096,9 +1096,9 @@ static int alloc_charge_hpage(struct mm_struct *mm, struct collapse_control *cc) return SCAN_SUCCEED; } -static void collapse_huge_page(struct mm_struct *mm, unsigned long address, - int referenced, int unmapped, - struct collapse_control *cc) +static int collapse_huge_page(struct mm_struct *mm, unsigned long address, + int referenced, int unmapped, + struct collapse_control *cc) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; @@ -1106,7 +1106,7 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, pgtable_t pgtable; struct page *new_page; spinlock_t *pmd_ptl, *pte_ptl; - int isolated = 0, result = 0; + int result = SCAN_FAIL; struct vm_area_struct *vma; struct mmu_notifier_range range; @@ -1186,11 +1186,11 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, mmu_notifier_invalidate_range_end(&range); spin_lock(pte_ptl); - isolated = __collapse_huge_page_isolate(vma, address, pte, - &compound_pagelist); + result = __collapse_huge_page_isolate(vma, address, pte, + &compound_pagelist); spin_unlock(pte_ptl); - if (unlikely(!isolated)) { + if (unlikely(result != SCAN_SUCCEED)) { pte_unmap(pte); spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); @@ -1238,25 +1238,23 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, cc->hpage = NULL; - khugepaged_pages_collapsed++; result = SCAN_SUCCEED; out_up_write: mmap_write_unlock(mm); out_nolock: if (!IS_ERR_OR_NULL(cc->hpage)) mem_cgroup_uncharge(page_folio(cc->hpage)); - trace_mm_collapse_huge_page(mm, isolated, result); - return; + trace_mm_collapse_huge_page(mm, result == SCAN_SUCCEED, result); + return result; } -static int khugepaged_scan_pmd(struct mm_struct *mm, - struct vm_area_struct *vma, - unsigned long address, +static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, bool *mmap_locked, struct collapse_control *cc) { pmd_t *pmd; pte_t *pte, *_pte; - int ret = 0, result = 0, referenced = 0; + int result = SCAN_FAIL, referenced = 0; int none_or_zero = 0, shared = 0; struct page *page = NULL; unsigned long _address; @@ -1266,6 +1264,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, VM_BUG_ON(address & ~HPAGE_PMD_MASK); + *mmap_locked = true; + result = find_pmd_or_thp_or_none(mm, address, &pmd); if (result != SCAN_SUCCEED) goto out; @@ -1391,18 +1391,22 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; - ret = 1; } out_unmap: pte_unmap_unlock(pte, ptl); - if (ret) { - /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, referenced, unmapped, cc); + if (result == SCAN_SUCCEED) { + /* + * collapse_huge_page() will return with the mmap_lock released + * - so let the caller know mmap_lock was dropped + */ + *mmap_locked = false; + result = collapse_huge_page(mm, address, referenced, + unmapped, cc); } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, none_or_zero, result, unmapped); - return ret; + return result; } static void collect_mm_slot(struct mm_slot *mm_slot) @@ -1679,8 +1683,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * + restore gaps in the page cache; * + unlock and free huge page; */ -static void collapse_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct collapse_control *cc) +static int collapse_file(struct mm_struct *mm, struct file *file, + pgoff_t start, struct collapse_control *cc) { struct address_space *mapping = file->f_mapping; struct page *new_page; @@ -1982,8 +1986,6 @@ static void collapse_file(struct mm_struct *mm, struct file *file, */ retract_page_tables(mapping, start); cc->hpage = NULL; - - khugepaged_pages_collapsed++; } else { struct page *page; @@ -2031,11 +2033,12 @@ static void collapse_file(struct mm_struct *mm, struct file *file, if (!IS_ERR_OR_NULL(cc->hpage)) mem_cgroup_uncharge(page_folio(cc->hpage)); /* TODO: tracepoints */ + return result; } -static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, - struct collapse_control *cc) +static int khugepaged_scan_file(struct mm_struct *mm, + struct file *file, pgoff_t start, + struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -2108,16 +2111,16 @@ static void khugepaged_scan_file(struct mm_struct *mm, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - collapse_file(mm, file, start, cc); + result = collapse_file(mm, file, start, cc); } } /* TODO: tracepoints */ + return result; } #else -static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, - struct collapse_control *cc) +static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, pgoff_t start, + struct collapse_control *cc) { BUILD_BUG(); } @@ -2189,7 +2192,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, goto skip; while (khugepaged_scan.address < hend) { - int ret; + int result; + bool mmap_locked; + cond_resched(); if (unlikely(khugepaged_test_exit(mm))) goto breakouterloop; @@ -2203,17 +2208,21 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, khugepaged_scan.address); mmap_read_unlock(mm); - ret = 1; - khugepaged_scan_file(mm, file, pgoff, cc); + mmap_locked = false; + result = khugepaged_scan_file(mm, file, pgoff, + cc); fput(file); } else { - ret = khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, cc); + result = khugepaged_scan_pmd(mm, vma, + khugepaged_scan.address, + &mmap_locked, cc); } + if (result == SCAN_SUCCEED) + ++khugepaged_pages_collapsed; /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; progress += HPAGE_PMD_NR; - if (ret) + if (!mmap_locked) /* we released mmap_lock so break loop */ goto breakouterloop_mmap_lock; if (progress >= pages) From patchwork Mon May 2 18:17:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12834610 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1F09C433EF for ; Mon, 2 May 2022 18:17:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3C71A6B007D; Mon, 2 May 2022 14:17:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 34F6E6B007E; Mon, 2 May 2022 14:17:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 17B1D6B0080; Mon, 2 May 2022 14:17:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 0247C6B007D for ; Mon, 2 May 2022 14:17:40 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id DA18780511 for ; Mon, 2 May 2022 18:17:39 +0000 (UTC) X-FDA: 79421611038.04.F6565F3 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf10.hostedemail.com (Postfix) with ESMTP id 51ECCC0074 for ; Mon, 2 May 2022 18:17:25 +0000 (UTC) Received: by mail-pf1-f201.google.com with SMTP id c202-20020a621cd3000000b0050dd228152aso3461100pfc.11 for ; Mon, 02 May 2022 11:17:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=LkSv88jny1OXgSkaAcGQstEwQBZTdj4fUS6JBH9Lw7w=; b=Pie2u8QYFgELJ6U2NgysWkV2PXS+bV4eQ6RTLnhNUQR5ffAFY/VEkqCWuG3RAponpJ ah8NalnAWe4hT0xA/dlNZnl/Ok/cpDvaeXf7i8J4JxQV6QtFJ31pY7tU0KNrLAmVox6h PYTL4yr6nzQNXA0UvrW8tOvpgMu7YQd+KeyopWJIPsvBeHPnsNFpy463/8fev25S4yFt wxIgqOX1TahacTWxSaT3E+4O9uemqXdjPzgTJTXbm2X3Ob5/pMGKWRExAsfiqNX8qaBt w6djJRZog4x4Jf7PsohkeZwCyefivt4bC856IR6XbmZTFggyKzSs/IN3IzuRQ7es8hzq 72Pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=LkSv88jny1OXgSkaAcGQstEwQBZTdj4fUS6JBH9Lw7w=; b=Y+06KmDbyLBzm93g13WZNxr/qWoryWC6kIfYAII+Z3e4ZqSGwqrCrHJ7RDM2yKIC9G 6RjLhNH3XWmWqJe+zHdcMIZv2CSt6Cy2P2MuVNNHlC1UwlZRzNX8V11zii4ErbMCs8xX OypurXYmGwwtg6Ew6v32iNHMkhunpiOqb7sSsn206a2m1oXxCTovXbD9J8R4g1vcbX4p jRltBCBrogu++0GWbp9suENSoU4Jrne1mS/GAzzB09OiEi8NuyWt/m9QArb38Vf6V+hG NC5fuc3dxP2ske4aQsRKqlJJP8oC2zV0ytJFm7bxtaa19RjQFDdSNlo95xfQI1vFZiiY yj+w== X-Gm-Message-State: AOAM531YIjg0KUmriazDONxlwqhAIbu1Ky2WBTIyEe8pdDDimQWwjUP2 p3rlGuakVGbMYL9991FuwitqPIo/CJDz X-Google-Smtp-Source: ABdhPJxR/6IIvcinN95gvfKGVBlvrtwSs9TP+I0GTp7DY2oBhzR6kr5BbPUPgrBXq4JgNk1V1codAnfeuNQ2 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90a:e510:b0:1d9:ee23:9fa1 with SMTP id t16-20020a17090ae51000b001d9ee239fa1mr88452pjy.0.1651515457636; Mon, 02 May 2022 11:17:37 -0700 (PDT) Date: Mon, 2 May 2022 11:17:07 -0700 In-Reply-To: <20220502181714.3483177-1-zokeefe@google.com> Message-Id: <20220502181714.3483177-7-zokeefe@google.com> Mime-Version: 1.0 References: <20220502181714.3483177-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v4 06/13] mm/khugepaged: add flag to ignore khugepaged_max_ptes_* From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 51ECCC0074 X-Stat-Signature: arx871eyx8wxzoy7dk7jcqxuwm1urztn Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Pie2u8QY; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of 3QSBwYgcKCFwTIE8898AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3QSBwYgcKCFwTIE8898AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--zokeefe.bounces.google.com X-Rspam-User: X-HE-Tag: 1651515445-965787 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add enforce_pte_scan_limits flag to struct collapse_control that allows context to ignore the sysfs-controlled knobs khugepaged_max_ptes_[none|swap|shared] and set this flag in khugepaged collapse context to preserve existing khugepaged behavior. This flag will be used (unset) when introducing madvise collapse context since here, the user presumably has reason to believe the collapse will be beneficial and khugepaged heuristics shouldn't tell the user they are wrong. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 31 +++++++++++++++++++++---------- 1 file changed, 21 insertions(+), 10 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 986344a04165..94f18be83835 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -87,6 +87,9 @@ static struct kmem_cache *mm_slot_cache __read_mostly; #define MAX_PTE_MAPPED_THP 8 struct collapse_control { + /* Respect khugepaged_max_ptes_[none|swap|shared] */ + bool enforce_pte_scan_limits; + /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; @@ -614,6 +617,7 @@ static bool is_refcount_suitable(struct page *page) static int __collapse_huge_page_isolate(struct vm_area_struct *vma, unsigned long address, pte_t *pte, + struct collapse_control *cc, struct list_head *compound_pagelist) { struct page *page = NULL; @@ -627,7 +631,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (pte_none(pteval) || (pte_present(pteval) && is_zero_pfn(pte_pfn(pteval)))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !cc->enforce_pte_scan_limits)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -647,8 +652,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, VM_BUG_ON_PAGE(!PageAnon(page), page); - if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + if (cc->enforce_pte_scan_limits && page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -1186,7 +1191,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, mmu_notifier_invalidate_range_end(&range); spin_lock(pte_ptl); - result = __collapse_huge_page_isolate(vma, address, pte, + result = __collapse_huge_page_isolate(vma, address, pte, cc, &compound_pagelist); spin_unlock(pte_ptl); @@ -1276,7 +1281,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, _pte++, _address += PAGE_SIZE) { pte_t pteval = *_pte; if (is_swap_pte(pteval)) { - if (++unmapped <= khugepaged_max_ptes_swap) { + if (++unmapped <= khugepaged_max_ptes_swap || + !cc->enforce_pte_scan_limits) { /* * Always be strict with uffd-wp * enabled swap entries. Please see @@ -1295,7 +1301,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, } if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !cc->enforce_pte_scan_limits)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -1325,8 +1332,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, goto out_unmap; } - if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + if (cc->enforce_pte_scan_limits && + page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out_unmap; @@ -2056,7 +2064,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, continue; if (xa_is_value(page)) { - if (++swap > khugepaged_max_ptes_swap) { + if (cc->enforce_pte_scan_limits && + ++swap > khugepaged_max_ptes_swap) { result = SCAN_EXCEED_SWAP_PTE; count_vm_event(THP_SCAN_EXCEED_SWAP_PTE); break; @@ -2107,7 +2116,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, rcu_read_unlock(); if (result == SCAN_SUCCEED) { - if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { + if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none && + cc->enforce_pte_scan_limits) { result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { @@ -2337,6 +2347,7 @@ static int khugepaged(void *none) { struct mm_slot *mm_slot; struct collapse_control cc = { + .enforce_pte_scan_limits = true, .last_target_node = NUMA_NO_NODE, .alloc_charge_hpage = &alloc_charge_hpage, }; From patchwork Mon May 2 18:17:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12834611 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 023DFC433FE for ; Mon, 2 May 2022 18:17:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7957B6B007E; Mon, 2 May 2022 14:17:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 71E056B0080; Mon, 2 May 2022 14:17:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4AEBD6B0081; Mon, 2 May 2022 14:17:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 38F1C6B007E for ; Mon, 2 May 2022 14:17:41 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1919A2135D for ; Mon, 2 May 2022 18:17:41 +0000 (UTC) X-FDA: 79421611122.05.C021B7B Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf26.hostedemail.com (Postfix) with ESMTP id B4F1014007A for ; Mon, 2 May 2022 18:17:38 +0000 (UTC) Received: by mail-pl1-f202.google.com with SMTP id x23-20020a170902b41700b0015ea144789fso1544455plr.13 for ; Mon, 02 May 2022 11:17:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=erI4ca1qTOIvCwMNCO2PV4QqS7GqFU6EKIq6u+xihDE=; b=qpLnfevqGgSg3XwH0US2RW/l35u8YKVN/qvrkISR/gu1TUsMqlz1HrGDk7BvtM3qM8 yAoTyw8cOWIbxvihiTeNTezXH+YJHQYzwO17w3Nr059yRxukfbln3AMyAFpdMTw771lY FQr1jsc+boVagcmF69K5F7MNsHn84EXD8UnUMN2k05AheZ6pj23dEdfvVJYCFDdRnVIs onhKW30e5gmNxZB58w5h/lF1qr6hI1IFyunGqT0FIRqxDQN7ZwdbOaAbfg/OcKcfO9Kj ZVrEKEmmFrdzKqSZvKur25o8HAvf1cmiNOfGgxodIzRIe+3TGmMzrTFRmBIr+cD+PvkN 7Bjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=erI4ca1qTOIvCwMNCO2PV4QqS7GqFU6EKIq6u+xihDE=; b=dhQhu6afrO2GoUwGLiVkGRy4xJbhTzG0QQBSFSXVFr1j0lNTC4NIzBqJLSS7/JkIHz LsOqqV+cvBrP3brbIlI511aopYk4Bp7tQHVXD5kTNaoT1vigI8RNKyAny0QnZN/BZ6jz mpWoO3dazH8hHFh0C8wPD6I6mHC256URvRC60ocBa9R9iaxY9IZgPTyxHHswuVSWWGRb w71YolO45XDegGn8JMZ2PMpc2IHK+514aqU6j7mBpBZItBd9qwemRtV9awhZ1DcW3cAg Zusi1fjcnWIyfseAWFWyoYfjD5387EkCYJP57uplqovu2eZKr0vB9hXIZngNZnlZtTMy K8kQ== X-Gm-Message-State: AOAM530yXYov2JfSf7qV04/JkmIFyU/YxX7NvxAemUXhR7V9EI9FcB10 tAgCy2kbjSRmrkIhVyt6CkcQWyLv8agd X-Google-Smtp-Source: ABdhPJwgGLRJ/LEJlLpB5qfd9A86fzKOZqNUiCg0REO0SNO6uC7HzxavsvvQmHBln4S+3eaEZsV0RQ8KjtrE X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:7795:b0:157:c50:53a6 with SMTP id o21-20020a170902779500b001570c5053a6mr12845224pll.40.1651515459586; Mon, 02 May 2022 11:17:39 -0700 (PDT) Date: Mon, 2 May 2022 11:17:08 -0700 In-Reply-To: <20220502181714.3483177-1-zokeefe@google.com> Message-Id: <20220502181714.3483177-8-zokeefe@google.com> Mime-Version: 1.0 References: <20220502181714.3483177-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v4 07/13] mm/khugepaged: add flag to ignore page young/referenced requirement From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: bwf95rhowoidpcdhh637ydyjhuihox5w Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=qpLnfevq; spf=pass (imf26.hostedemail.com: domain of 3QyBwYgcKCF4VKGAABACKKCHA.8KIHEJQT-IIGR68G.KNC@flex--zokeefe.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3QyBwYgcKCF4VKGAABACKKCHA.8KIHEJQT-IIGR68G.KNC@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: B4F1014007A X-HE-Tag: 1651515458-211968 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add enforce_young flag to struct collapse_control that allows context to ignore requirement that some pages in region being collapsed be young or referenced. Set this flag in khugepaged collapse context to preserve existing khugepaged behavior. This flag will be used (unset) when introducing madvise collapse context since here, the user presumably has reason to believe the collapse will be beneficial and khugepaged heuristics shouldn't tell the user they are wrong. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 94f18be83835..b57a4a643053 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -90,6 +90,9 @@ struct collapse_control { /* Respect khugepaged_max_ptes_[none|swap|shared] */ bool enforce_pte_scan_limits; + /* Require memory to be young */ + bool enforce_young; + /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; @@ -720,9 +723,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, list_add_tail(&page->lru, compound_pagelist); next: /* There should be enough young pte to collapse the page */ - if (pte_young(pteval) || - page_is_young(page) || PageReferenced(page) || - mmu_notifier_test_young(vma->vm_mm, address)) + if (cc->enforce_young && + (pte_young(pteval) || page_is_young(page) || + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + address))) referenced++; if (pte_write(pteval)) @@ -731,7 +735,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (unlikely(!writable)) { result = SCAN_PAGE_RO; - } else if (unlikely(!referenced)) { + } else if (unlikely(cc->enforce_young && !referenced)) { result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; @@ -1388,14 +1392,16 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, result = SCAN_PAGE_COUNT; goto out_unmap; } - if (pte_young(pteval) || - page_is_young(page) || PageReferenced(page) || - mmu_notifier_test_young(vma->vm_mm, address)) + if (cc->enforce_young && + (pte_young(pteval) || page_is_young(page) || + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + address))) referenced++; } if (!writable) { result = SCAN_PAGE_RO; - } else if (!referenced || (unmapped && referenced < HPAGE_PMD_NR/2)) { + } else if (cc->enforce_young && (!referenced || (unmapped && referenced + < HPAGE_PMD_NR / 2))) { result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; @@ -2348,6 +2354,7 @@ static int khugepaged(void *none) struct mm_slot *mm_slot; struct collapse_control cc = { .enforce_pte_scan_limits = true, + .enforce_young = true, .last_target_node = NUMA_NO_NODE, .alloc_charge_hpage = &alloc_charge_hpage, }; From patchwork Mon May 2 18:17:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12834612 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFAA1C433EF for ; Mon, 2 May 2022 18:17:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A2476B0080; Mon, 2 May 2022 14:17:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 178C26B0081; Mon, 2 May 2022 14:17:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E23186B0082; Mon, 2 May 2022 14:17:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id CFB3C6B0080 for ; Mon, 2 May 2022 14:17:42 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AC14A2A3E0 for ; Mon, 2 May 2022 18:17:42 +0000 (UTC) X-FDA: 79421611164.18.E21BE98 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf27.hostedemail.com (Postfix) with ESMTP id 8B4AE4006E for ; Mon, 2 May 2022 18:17:40 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id z5-20020a170902ccc500b0015716eaec65so6876591ple.14 for ; Mon, 02 May 2022 11:17:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=0mba+3dWuys8ZqfGl1ueKmLKRRIza/wcQSdGlT53Q5g=; b=gmb0w98uuJwkkpBenzX4Qk3uUk6QueXd3UJnDKj7ZjlJvr4O5ioa6g2pE+KBYWkWso 5z4SyjDIX3EC7GtsIRnaFJuzPBF+klO/9+qnRp7d3+EHzSGDI/HrYZm0tuVjgIzkRiij 9lJvwblGCEh0dAfUoXiUE5AlWwj5MM7s8g/RgsPZp1cpeoAqKdPbXva76Jw42BIFSULJ VHVlC2x7oAYaMgURhSJX0BuUDjmKv7Get/fkNKv6MN+J9mZ1rmL2D6XOPrOEOkKWwzh9 Y1zYgWltfAcFNmuieneyoFMsRv1L95JDb7ddWfT/J8RMCNlfeRncWuG7RQWu5c74Lxhh eLpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=0mba+3dWuys8ZqfGl1ueKmLKRRIza/wcQSdGlT53Q5g=; b=IvsjM8H+ovQojAk5pd1U35nGSTpQKpgsCwOUdiIzKKMfU6ql5NWmO2ObSuGxWkIHHC 6Dx2f2t/sTd/NV5mmUg7cNwej0sJVJiIDOOyud1HXuyY1ndKVjycTh1GRHcN9a2RNfjv +7kjt28yxYWnKuhK8dShv0H0bwCWxuTU+lprRRLHI9A3Jm0TExbryPopfxZMW7m+Ah4a woUKVQG6zHC72NlN1R5BPiRklDKUdO2enREaD8N/lyO106EpeKAye3szqbAZYwwpjyas d5DiSoEwX7f0WquC/uj2VrA7YKdiYAA9ONAneqoZbqh3vhP9it+XLbqIIqUDnQC2vwI5 +pGw== X-Gm-Message-State: AOAM533BQNd6Bgrgk1DabvCieUTFR9vZFa1lG0RVKNDY7J5AwptjJgsa a4p533LXUSo5JDuKiOowJJsB4F7YOVjr X-Google-Smtp-Source: ABdhPJxI8sVg499YUWLqaTkETdj9n24AreNTV8KSUcU6RcfVlgpuUNUiJS9j3/ezJbUlcKl9Wao8lpkxf1sj X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:903:300b:b0:15c:b49b:6647 with SMTP id o11-20020a170903300b00b0015cb49b6647mr12961741pla.90.1651515461316; Mon, 02 May 2022 11:17:41 -0700 (PDT) Date: Mon, 2 May 2022 11:17:09 -0700 In-Reply-To: <20220502181714.3483177-1-zokeefe@google.com> Message-Id: <20220502181714.3483177-9-zokeefe@google.com> Mime-Version: 1.0 References: <20220502181714.3483177-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v4 08/13] mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: 5w5qhh7guah99iftsnfj5yrqqshrmnpq X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 8B4AE4006E Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=gmb0w98u; spf=pass (imf27.hostedemail.com: domain of 3RSBwYgcKCGAXMICCDCEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3RSBwYgcKCGAXMICCDCEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1651515460-27621 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This idea was introduced by David Rientjes[1]. Introduce a new madvise mode, MADV_COLLAPSE, that allows users to request a synchronous collapse of memory at their own expense. The benefits of this approach are: * CPU is charged to the process that wants to spend the cycles for the THP * Avoid unpredictable timing of khugepaged collapse An immediate user of this new functionality are malloc() implementations that manage memory in hugepage-sized chunks, but sometimes subrelease memory back to the system in native-sized chunks via MADV_DONTNEED; zapping the pmd. Later, when the memory is hot, the implementation could madvise(MADV_COLLAPSE) to re-back the memory by THPs to regain hugepage coverage and dTLB performance. TCMalloc is such an implementation that could benefit from this[2]. Only privately-mapped anon memory is supported for now, but it is expected that file and shmem support will be added later to support the use-case of backing executable text by THPs. Current support provided by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which might impair services from serving at their full rated load after (re)starting. Tricks like mremap(2)'ing text onto anonymous memory to immediately realize iTLB performance prevents page sharing and demand paging, both of which increase steady state memory footprint. With MADV_COLLAPSE, we get the best of both worlds: Peak upfront performance and lower RAM footprints. This call respects THP eligibility as determined by the system-wide /sys/kernel/mm/transparent_hugepage/enabled sysfs settings and the VMA flags for the memory range being collapsed. THP allocation may enter direct reclaim and/or compaction. [1] https://lore.kernel.org/linux-mm/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/ [2] https://github.com/google/tcmalloc/tree/master/tcmalloc Suggested-by: David Rientjes Signed-off-by: Zach O'Keefe Reported-by: kernel test robot --- arch/alpha/include/uapi/asm/mman.h | 2 + arch/mips/include/uapi/asm/mman.h | 2 + arch/parisc/include/uapi/asm/mman.h | 2 + arch/xtensa/include/uapi/asm/mman.h | 2 + include/linux/huge_mm.h | 12 ++ include/uapi/asm-generic/mman-common.h | 2 + mm/khugepaged.c | 166 +++++++++++++++++++++++-- mm/madvise.c | 5 + 8 files changed, 181 insertions(+), 12 deletions(-) diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index 4aa996423b0d..763929e814e9 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -76,6 +76,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index 1be428663c10..c6e1fc77c996 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -103,6 +103,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index a7ea3204a5fa..22133a6a506e 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -70,6 +70,8 @@ #define MADV_WIPEONFORK 71 /* Zero memory on fork, child only */ #define MADV_KEEPONFORK 72 /* Undo MADV_WIPEONFORK */ +#define MADV_COLLAPSE 73 /* Synchronous hugepage collapse */ + #define MADV_HWPOISON 100 /* poison a page for testing */ #define MADV_SOFT_OFFLINE 101 /* soft offline page for testing */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index 7966a58af472..1ff0c858544f 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -111,6 +111,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 9a26bd10e083..4a2ea1b5437c 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -222,6 +222,9 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); +int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end); void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, long adjust_next); spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); @@ -378,6 +381,15 @@ static inline int hugepage_madvise(struct vm_area_struct *vma, BUG(); return 0; } + +static inline int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + BUG(); + return 0; +} + static inline void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6c1aa92a92e4..6ce1f1ceb432 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -77,6 +77,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b57a4a643053..3ba2c570da5e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -837,6 +837,22 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void) return khugepaged_defrag() ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT; } +static bool alloc_hpage(gfp_t gfp, int node, struct collapse_control *cc) +{ + VM_BUG_ON_PAGE(cc->hpage, cc->hpage); + + cc->hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); + if (unlikely(!cc->hpage)) { + count_vm_event(THP_COLLAPSE_ALLOC_FAILED); + cc->hpage = ERR_PTR(-ENOMEM); + return false; + } + + prep_transhuge_page(cc->hpage); + count_vm_event(THP_COLLAPSE_ALLOC); + return true; +} + #ifdef CONFIG_NUMA static int khugepaged_find_target_node(struct collapse_control *cc) { @@ -882,18 +898,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) static bool khugepaged_alloc_page(gfp_t gfp, int node, struct collapse_control *cc) { - VM_BUG_ON_PAGE(cc->hpage, cc->hpage); - - cc->hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); - if (unlikely(!cc->hpage)) { - count_vm_event(THP_COLLAPSE_ALLOC_FAILED); - cc->hpage = ERR_PTR(-ENOMEM); - return false; - } - - prep_transhuge_page(cc->hpage); - count_vm_event(THP_COLLAPSE_ALLOC); - return true; + return alloc_hpage(gfp, node, cc); } #else static int khugepaged_find_target_node(struct collapse_control *cc) @@ -2462,3 +2467,140 @@ void khugepaged_min_free_kbytes_update(void) set_recommended_min_free_kbytes(); mutex_unlock(&khugepaged_mutex); } + +static void madvise_collapse_cleanup_page(struct page **hpage) +{ + if (!IS_ERR(*hpage) && *hpage) + put_page(*hpage); + *hpage = NULL; +} + +static int madvise_collapse_errno(enum scan_result r) +{ + switch (r) { + case SCAN_PMD_NULL: + case SCAN_ADDRESS_RANGE: + case SCAN_VMA_NULL: + case SCAN_PTE_NON_PRESENT: + case SCAN_PAGE_NULL: + /* + * Addresses in the specified range are not currently mapped, + * or are outside the AS of the process. + */ + return -ENOMEM; + case SCAN_ALLOC_HUGE_PAGE_FAIL: + case SCAN_CGROUP_CHARGE_FAIL: + /* A kernel resource was temporarily unavailable. */ + return -EAGAIN; + default: + return -EINVAL; + } +} + +static int madvise_alloc_charge_hpage(struct mm_struct *mm, + struct collapse_control *cc) +{ + if (!alloc_hpage(GFP_TRANSHUGE, khugepaged_find_target_node(cc), cc)) + return SCAN_ALLOC_HUGE_PAGE_FAIL; + if (unlikely(mem_cgroup_charge(page_folio(cc->hpage), mm, + GFP_TRANSHUGE))) + return SCAN_CGROUP_CHARGE_FAIL; + count_memcg_page_event(cc->hpage, THP_COLLAPSE_ALLOC); + return SCAN_SUCCEED; +} + +int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + struct collapse_control cc = { + .enforce_pte_scan_limits = false, + .enforce_young = false, + .last_target_node = NUMA_NO_NODE, + .hpage = NULL, + .alloc_charge_hpage = &madvise_alloc_charge_hpage, + }; + struct mm_struct *mm = vma->vm_mm; + unsigned long hstart, hend, addr; + int thps = 0, nr_hpages = 0, result = SCAN_FAIL; + bool mmap_locked = true; + + BUG_ON(vma->vm_start > start); + BUG_ON(vma->vm_end < end); + + *prev = vma; + + if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) + return -EINVAL; + + hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; + hend = end & HPAGE_PMD_MASK; + nr_hpages = (hend - hstart) >> HPAGE_PMD_SHIFT; + + if (hstart >= hend || !transparent_hugepage_active(vma)) + return -EINVAL; + + mmgrab(mm); + lru_add_drain(); + + for (addr = hstart; ; ) { + mmap_assert_locked(mm); + cond_resched(); + result = SCAN_FAIL; + + if (unlikely(khugepaged_test_exit(mm))) { + result = SCAN_ANY_PROCESS; + break; + } + + memset(cc.node_load, 0, sizeof(cc.node_load)); + result = khugepaged_scan_pmd(mm, vma, addr, &mmap_locked, &cc); + if (!mmap_locked) + *prev = NULL; /* tell madvise we dropped mmap_lock */ + + switch (result) { + /* Whitelisted set of results where continuing OK */ + case SCAN_SUCCEED: + case SCAN_PMD_MAPPED: + ++thps; + case SCAN_PMD_NULL: + case SCAN_PTE_NON_PRESENT: + case SCAN_PTE_UFFD_WP: + case SCAN_PAGE_RO: + case SCAN_LACK_REFERENCED_PAGE: + case SCAN_PAGE_NULL: + case SCAN_PAGE_COUNT: + case SCAN_PAGE_LOCK: + case SCAN_PAGE_COMPOUND: + break; + case SCAN_PAGE_LRU: + lru_add_drain_all(); + goto retry; + default: + /* Other error, exit */ + goto break_loop; + } + addr += HPAGE_PMD_SIZE; + if (addr >= hend) + break; +retry: + if (!mmap_locked) { + mmap_read_lock(mm); + mmap_locked = true; + result = hugepage_vma_revalidate(mm, addr, &vma); + if (result) + goto out; + } + madvise_collapse_cleanup_page(&cc.hpage); + } + +break_loop: + /* madvise_walk_vmas() expects us to hold mmap_lock on return */ + if (!mmap_locked) + mmap_read_lock(mm); +out: + mmap_assert_locked(mm); + madvise_collapse_cleanup_page(&cc.hpage); + mmdrop(mm); + + return thps == nr_hpages ? 0 : madvise_collapse_errno(result); +} diff --git a/mm/madvise.c b/mm/madvise.c index 5f4537511532..638517952bd2 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -59,6 +59,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_FREE: case MADV_POPULATE_READ: case MADV_POPULATE_WRITE: + case MADV_COLLAPSE: return 0; default: /* be safe, default to 1. list exceptions explicitly */ @@ -1054,6 +1055,8 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, if (error) goto out; break; + case MADV_COLLAPSE: + return madvise_collapse(vma, prev, start, end); } anon_name = anon_vma_name(vma); @@ -1147,6 +1150,7 @@ madvise_behavior_valid(int behavior) #ifdef CONFIG_TRANSPARENT_HUGEPAGE case MADV_HUGEPAGE: case MADV_NOHUGEPAGE: + case MADV_COLLAPSE: #endif case MADV_DONTDUMP: case MADV_DODUMP: @@ -1336,6 +1340,7 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * MADV_NOHUGEPAGE - mark the given range as not worth being backed by * transparent huge pages so the existing pages will not be * coalesced into THP and new pages will not be allocated as THP. + * MADV_COLLAPSE - synchronously coalesce pages into new THP. * MADV_DONTDUMP - the application wants to prevent pages in the given range * from being included in its core dump. * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump. From patchwork Mon May 2 18:17:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12834613 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90126C433F5 for ; Mon, 2 May 2022 18:17:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD74F6B0081; Mon, 2 May 2022 14:17:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D3A6E6B0082; Mon, 2 May 2022 14:17:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF1C46B0083; Mon, 2 May 2022 14:17:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 8CF7B6B0081 for ; Mon, 2 May 2022 14:17:44 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6857061825 for ; Mon, 2 May 2022 18:17:44 +0000 (UTC) X-FDA: 79421611248.30.218AFDE Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf09.hostedemail.com (Postfix) with ESMTP id CD0BD140068 for ; Mon, 2 May 2022 18:17:38 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id m8-20020a17090aab0800b001cb1320ef6eso29420pjq.3 for ; Mon, 02 May 2022 11:17:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Q/9aZGSnPZ21ElKjV9y+LJSB6/AyGVS0QWYpzNBYeKQ=; b=eCeTuvRo7rnFDDMcY0ftap58Ts/WBC3lSKK6iH9IzkxBbzAfseIBvMgP0oeaBNkQbs kddKe5wRIWF2CcwUQOtgtdayghV8J9wI98RRYGK6tESdnINqDA9eLeUX/HoZsG8XR9zs Jr2elkChGPaOcVzVc7XqfvAwwgZQkaJVhw7wYUmAVAaMTUz5weWApU/2ftWLXXZNPg5Y aEOYQL4qrKLDLLht5khxAh6DKbgx0ANqHePIPsZ20oR+4YAovTU/5Rv1+NWxGGn0PkdZ fkLZUESlIamVaZR2v1HQqeqbnYdfRH5UkWqFqpMUQa0XUgBdod+sGBgrMIpkvABD3eHB yRUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Q/9aZGSnPZ21ElKjV9y+LJSB6/AyGVS0QWYpzNBYeKQ=; b=BlmaQOdzpKltx0Y77HxRtBr55h5DRpEacR2pZZyLbJYgkhobcAiipQtyI7rElM3+4A U0HHNLFUzi74Zo/wAiqhWWAtkoeHASeVWWRKtOnC+nnwjLKxVRRE8oLkSZ0I7VadM8ek AbAR/aJ1Ot177DsEAJ/URsqWA8Bg/hWptmWMW/pNcb+ocD1ImUCBCzxL6SuFf/5tvYZa VsBkiirG50Fbh5crhdQFoXpN4AVo0AwWSqW/79QwRKqVsAc/yk+i0ET2k8FDqv2d54M1 i+YGhd0qI3LTyh8ZGQ1y4U5N0BvWzfv+1rc69MYs9qwKDQNqjT/O/TGdye2wDqsUCPEm JrRA== X-Gm-Message-State: AOAM530abdLvW/YC0+u5kkS5Y8FuF7hVdtcYVSJzr9FmHWUs2Pu1ID8G IRjrOqw1Z7yRFTU2T4gf7OwuSdSOziyw X-Google-Smtp-Source: ABdhPJyS41DKolokIfiU1uB3DKoH5OMCI4TMGPfTO5QmKuvRXdneQ6kVfnXGLIQ6z3rGh0e11wgkz+a4k60L X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:903:230f:b0:15e:8bbb:f7b0 with SMTP id d15-20020a170903230f00b0015e8bbbf7b0mr12829483plh.75.1651515462932; Mon, 02 May 2022 11:17:42 -0700 (PDT) Date: Mon, 2 May 2022 11:17:10 -0700 In-Reply-To: <20220502181714.3483177-1-zokeefe@google.com> Message-Id: <20220502181714.3483177-10-zokeefe@google.com> Mime-Version: 1.0 References: <20220502181714.3483177-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v4 09/13] mm/khugepaged: rename prefix of shared collapse functions From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eCeTuvRo; spf=pass (imf09.hostedemail.com: domain of 3RiBwYgcKCGEYNJDDEDFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3RiBwYgcKCGEYNJDDEDFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: CD0BD140068 X-Stat-Signature: 96sf69fe8phpwhsj6mrkaq3apsbaokou X-HE-Tag: 1651515458-743284 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The following functions/tracepoints are shared between khugepaged and madvise collapse contexts. Replace the "khugepaged_" prefixe with generic "hpage_collapse_" prefix in such cases: huge_memory:mm_khugepaged_scan_pmd -> huge_memory:mm_hpage_collapse_scan_pmd khugepaged_test_exit() -> hpage_collapse_test_exit() khugepaged_scan_abort() -> hpage_collapse_scan_abort() khugepaged_scan_pmd() -> hpage_collapse_scan_pmd() khugepaged_find_target_node() -> hpage_collapse_find_target_node() Signed-off-by: Zach O'Keefe --- include/trace/events/huge_memory.h | 2 +- mm/khugepaged.c | 72 ++++++++++++++++-------------- 2 files changed, 39 insertions(+), 35 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 55392bf30a03..fb6c73632ff3 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -48,7 +48,7 @@ SCAN_STATUS #define EM(a, b) {a, b}, #define EMe(a, b) {a, b} -TRACE_EVENT(mm_khugepaged_scan_pmd, +TRACE_EVENT(mm_hpage_collapse_scan_pmd, TP_PROTO(struct mm_struct *mm, struct page *page, bool writable, int referenced, int none_or_zero, int status, int unmapped), diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 3ba2c570da5e..9f1b7e9e78c2 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -96,7 +96,7 @@ struct collapse_control { /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; - /* Last target selected in khugepaged_find_target_node() */ + /* Last target selected in hpage_collapse_find_target_node() */ int last_target_node; struct page *hpage; @@ -453,7 +453,7 @@ static void insert_to_mm_slots_hash(struct mm_struct *mm, hash_add(mm_slots_hash, &mm_slot->hash, (long)mm); } -static inline int khugepaged_test_exit(struct mm_struct *mm) +static inline int hpage_collapse_test_exit(struct mm_struct *mm) { return atomic_read(&mm->mm_users) == 0; } @@ -502,7 +502,7 @@ int __khugepaged_enter(struct mm_struct *mm) return -ENOMEM; /* __khugepaged_exit() must not run from under us */ - VM_BUG_ON_MM(khugepaged_test_exit(mm), mm); + VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm); if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) { free_mm_slot(mm_slot); return 0; @@ -566,11 +566,10 @@ void __khugepaged_exit(struct mm_struct *mm) } else if (mm_slot) { /* * This is required to serialize against - * khugepaged_test_exit() (which is guaranteed to run - * under mmap sem read mode). Stop here (after we - * return all pagetables will be destroyed) until - * khugepaged has finished working on the pagetables - * under the mmap_lock. + * hpage_collapse_test_exit() (which is guaranteed to run + * under mmap sem read mode). Stop here (after we return all + * pagetables will be destroyed) until khugepaged has finished + * working on the pagetables under the mmap_lock. */ mmap_write_lock(mm); mmap_write_unlock(mm); @@ -807,7 +806,7 @@ static void khugepaged_alloc_sleep(void) remove_wait_queue(&khugepaged_wait, &wait); } -static bool khugepaged_scan_abort(int nid, struct collapse_control *cc) +static bool hpage_collapse_scan_abort(int nid, struct collapse_control *cc) { int i; @@ -854,7 +853,7 @@ static bool alloc_hpage(gfp_t gfp, int node, struct collapse_control *cc) } #ifdef CONFIG_NUMA -static int khugepaged_find_target_node(struct collapse_control *cc) +static int hpage_collapse_find_target_node(struct collapse_control *cc) { int nid, target_node = 0, max_value = 0; @@ -901,7 +900,7 @@ static bool khugepaged_alloc_page(gfp_t gfp, int node, return alloc_hpage(gfp, node, cc); } #else -static int khugepaged_find_target_node(struct collapse_control *cc) +static int hpage_collapse_find_target_node(struct collapse_control *cc) { return 0; } @@ -981,7 +980,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, struct vm_area_struct *vma; unsigned long hstart, hend; - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(hpage_collapse_test_exit(mm))) return SCAN_ANY_PROCESS; *vmap = vma = find_vma(mm, address); @@ -1025,7 +1024,7 @@ static int find_pmd_or_thp_or_none(struct mm_struct *mm, /* * Bring missing pages in from swap, to complete THP collapse. - * Only done if khugepaged_scan_pmd believes it is worthwhile. + * Only done if hpage_collapse_scan_pmd believes it is worthwhile. * * Called and returns without pte mapped or spinlocks held, * but with mmap_lock held to protect against vma changes. @@ -1092,7 +1091,7 @@ static int alloc_charge_hpage(struct mm_struct *mm, struct collapse_control *cc) const struct cpumask *cpumask; #endif gfp_t gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; - int node = khugepaged_find_target_node(cc); + int node = hpage_collapse_find_target_node(cc); #ifdef CONFIG_NUMA /* sched to specified node before huge page memory copy */ @@ -1262,9 +1261,10 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, return result; } -static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long address, bool *mmap_locked, - struct collapse_control *cc) +static int hpage_collapse_scan_pmd(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, bool *mmap_locked, + struct collapse_control *cc) { pmd_t *pmd; pte_t *pte, *_pte; @@ -1358,7 +1358,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, * hit record. */ node = page_to_nid(page); - if (khugepaged_scan_abort(node, cc)) { + if (hpage_collapse_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; goto out_unmap; } @@ -1423,8 +1423,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unmapped, cc); } out: - trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, - none_or_zero, result, unmapped); + trace_mm_hpage_collapse_scan_pmd(mm, page, writable, referenced, + none_or_zero, result, unmapped); return result; } @@ -1434,7 +1434,7 @@ static void collect_mm_slot(struct mm_slot *mm_slot) lockdep_assert_held(&khugepaged_mm_lock); - if (khugepaged_test_exit(mm)) { + if (hpage_collapse_test_exit(mm)) { /* free mm_slot */ hash_del(&mm_slot->hash); list_del(&mm_slot->mm_node); @@ -1605,7 +1605,7 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) if (!mmap_write_trylock(mm)) return; - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(hpage_collapse_test_exit(mm))) goto out; for (i = 0; i < mm_slot->nr_pte_mapped_thp; i++) @@ -1668,7 +1668,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * it'll always mapped in small page size for uffd-wp * registered ranges. */ - if (!khugepaged_test_exit(mm) && !userfaultfd_wp(vma)) + if (!hpage_collapse_test_exit(mm) && + !userfaultfd_wp(vma)) collapse_and_free_pmd(mm, vma, addr, pmd); mmap_write_unlock(mm); } else { @@ -2094,7 +2095,7 @@ static int khugepaged_scan_file(struct mm_struct *mm, } node = page_to_nid(page); - if (khugepaged_scan_abort(node, cc)) { + if (hpage_collapse_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; break; } @@ -2183,7 +2184,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, vma = NULL; if (unlikely(!mmap_read_trylock(mm))) goto breakouterloop_mmap_lock; - if (likely(!khugepaged_test_exit(mm))) + if (likely(!hpage_collapse_test_exit(mm))) vma = find_vma(mm, khugepaged_scan.address); progress++; @@ -2191,7 +2192,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, unsigned long hstart, hend; cond_resched(); - if (unlikely(khugepaged_test_exit(mm))) { + if (unlikely(hpage_collapse_test_exit(mm))) { progress++; break; } @@ -2217,7 +2218,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, bool mmap_locked; cond_resched(); - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(hpage_collapse_test_exit(mm))) goto breakouterloop; VM_BUG_ON(khugepaged_scan.address < hstart || @@ -2234,9 +2235,10 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, cc); fput(file); } else { - result = khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, - &mmap_locked, cc); + result = hpage_collapse_scan_pmd(mm, vma, + khugepaged_scan.address, + &mmap_locked, + cc); } if (result == SCAN_SUCCEED) ++khugepaged_pages_collapsed; @@ -2260,7 +2262,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, * Release the current mm_slot if this mm is about to die, or * if we scanned all vmas of this mm. */ - if (khugepaged_test_exit(mm) || !vma) { + if (hpage_collapse_test_exit(mm) || !vma) { /* * Make sure that if mm_users is reaching zero while * khugepaged runs here, khugepaged_exit will find @@ -2500,7 +2502,8 @@ static int madvise_collapse_errno(enum scan_result r) static int madvise_alloc_charge_hpage(struct mm_struct *mm, struct collapse_control *cc) { - if (!alloc_hpage(GFP_TRANSHUGE, khugepaged_find_target_node(cc), cc)) + if (!alloc_hpage(GFP_TRANSHUGE, hpage_collapse_find_target_node(cc), + cc)) return SCAN_ALLOC_HUGE_PAGE_FAIL; if (unlikely(mem_cgroup_charge(page_folio(cc->hpage), mm, GFP_TRANSHUGE))) @@ -2547,13 +2550,14 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, cond_resched(); result = SCAN_FAIL; - if (unlikely(khugepaged_test_exit(mm))) { + if (unlikely(hpage_collapse_test_exit(mm))) { result = SCAN_ANY_PROCESS; break; } memset(cc.node_load, 0, sizeof(cc.node_load)); - result = khugepaged_scan_pmd(mm, vma, addr, &mmap_locked, &cc); + result = hpage_collapse_scan_pmd(mm, vma, addr, &mmap_locked, + &cc); if (!mmap_locked) *prev = NULL; /* tell madvise we dropped mmap_lock */ From patchwork Mon May 2 18:17:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12834614 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30EBFC433FE for ; Mon, 2 May 2022 18:17:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 749D36B0082; Mon, 2 May 2022 14:17:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6829B6B0083; Mon, 2 May 2022 14:17:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FCAE6B0085; Mon, 2 May 2022 14:17:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 3FDB36B0082 for ; Mon, 2 May 2022 14:17:46 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1D04D21364 for ; Mon, 2 May 2022 18:17:46 +0000 (UTC) X-FDA: 79421611332.17.6EA4D12 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf14.hostedemail.com (Postfix) with ESMTP id 710A3100077 for ; Mon, 2 May 2022 18:17:44 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id v9-20020a17090a7c0900b001cb45f88cdcso6705217pjf.0 for ; Mon, 02 May 2022 11:17:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ycD61MQgIhpcrtvxZMyg0KT3Zd0aTw6m/uwS7wZnXys=; b=eWqDRDglpNbXpoZ5+bjwoeBjrPveZD4+JoqNIiDYpzA9C4b0LEoN8sgqMwLUtQqkrV XukLcdKffY7BcCpD6iMZ3u41seHe32vkpqcpWqsPTzSx067X5AK1/paYIaAj2KkkKM3S fEp798yInNTmIP8aM6Dy+FNvLu8QpmzKhx4A4RFUN7zx490/JDp3MM/xAEaBfnaGvnq/ AMmTrtBj9mxq6S/QnP7ynEUJ15KfvZaatyIEdBzJy0pdXee68ack7LwPfn8CzK+al5C7 G/wLvo9VqbphH+dqeM+2A5gGdSKnIulz+utfao15EaWZ+3U1ajFcGvZUzNtJ0wSN/AJt asZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ycD61MQgIhpcrtvxZMyg0KT3Zd0aTw6m/uwS7wZnXys=; b=uXaq1k3uZIwCeDNfXrabJA9pIJDCSnfpR+AdsRM9S0HFm0AeMM6aotFOU2TRIv+o+3 q73vKmNln3xSvCF41AZCxdSXfxzXX2SY8rKu1KfLEHiCguimnBDL3rI7szb1Rh4JNiJ1 mwYM+Wg1aBFqhzBZOs7PzHmD/7HQyWyXwKJfAJHZyRnxvPSBQgz5LNh6EajR1TSNFXyc sp/RZcj1VterjKM3FTRKhE74+3WJRgaNdORsNHtpowS2YiLbiIEQfNyfxsii6ex6EBWu HN1ttdMAA2cZOziWhQ2wuO5YJNhfdJtupg4fsSfm7k6tXklBGpbxxtcnxXAGYurORcsI psow== X-Gm-Message-State: AOAM532SlJ0PnXmcU+yFXhOuNro6VL7ZcKfKYxqu48OldWG/fOYuuxKO YzXCNNtPEOuuRk4fhCR8KA4BifMJx2UJ X-Google-Smtp-Source: ABdhPJwpoVBUZSp2XH8ZwMq6o6jI2r2XQuljgd/A0tdydjRKi4B+ZObSQFMII7bMi7JXYu/uwxGU58AVeJ+g X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:aa7:9802:0:b0:50d:4f5d:fef6 with SMTP id e2-20020aa79802000000b0050d4f5dfef6mr12340340pfl.9.1651515464641; Mon, 02 May 2022 11:17:44 -0700 (PDT) Date: Mon, 2 May 2022 11:17:11 -0700 In-Reply-To: <20220502181714.3483177-1-zokeefe@google.com> Message-Id: <20220502181714.3483177-11-zokeefe@google.com> Mime-Version: 1.0 References: <20220502181714.3483177-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v4 10/13] mm/madvise: add MADV_COLLAPSE to process_madvise() From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 710A3100077 Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eWqDRDgl; spf=pass (imf14.hostedemail.com: domain of 3SCBwYgcKCGMaPLFFGFHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3SCBwYgcKCGMaPLFFGFHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Stat-Signature: qk1pu8dynqq8qdda8heyyxzbmukuft9z X-HE-Tag: 1651515464-569400 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Allow MADV_COLLAPSE behavior for process_madvise(2) if caller has CAP_SYS_ADMIN or is requesting collapse of it's own memory. Signed-off-by: Zach O'Keefe --- mm/madvise.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 638517952bd2..08c11217025a 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1168,13 +1168,15 @@ madvise_behavior_valid(int behavior) } static bool -process_madvise_behavior_valid(int behavior) +process_madvise_behavior_valid(int behavior, struct task_struct *task) { switch (behavior) { case MADV_COLD: case MADV_PAGEOUT: case MADV_WILLNEED: return true; + case MADV_COLLAPSE: + return task == current || capable(CAP_SYS_ADMIN); default: return false; } @@ -1452,7 +1454,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, goto free_iov; } - if (!process_madvise_behavior_valid(behavior)) { + if (!process_madvise_behavior_valid(behavior, task)) { ret = -EINVAL; goto release_task; } From patchwork Mon May 2 18:17:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12834615 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5D8CC433F5 for ; Mon, 2 May 2022 18:17:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4DA546B0083; Mon, 2 May 2022 14:17:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C88F6B0085; Mon, 2 May 2022 14:17:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C9B76B0087; Mon, 2 May 2022 14:17:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 072466B0083 for ; Mon, 2 May 2022 14:17:48 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id CB38160FD3 for ; Mon, 2 May 2022 18:17:47 +0000 (UTC) X-FDA: 79421611374.19.40BF57A Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf18.hostedemail.com (Postfix) with ESMTP id 9835A1C0073 for ; Mon, 2 May 2022 18:17:40 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id x23-20020a170902b41700b0015ea144789fso1544581plr.13 for ; Mon, 02 May 2022 11:17:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=oOYaPFS0snZF0jcIguSnt+fI0DE93HtI6yWk8Dzmu68=; b=V8UoMpIJzKn/WQL+az5vFTBLiMxGl2QFFTvG3bzNp7n+xcAWwkZXICv5LVaaYYgHEU ZHbc8k8HFJoSx/y4RdBQ4qjA8MJ9EgwZWTq3wiPWdwhbV4DcL1OUQgqsURbfMjrwaNhu gQlUcAtHXVbGws/6yp37qMOB7UsM+kOTPFrDHRHX3xfRfyHtlH4cQ1u42oRF0w1z/N+Y zKJ8qPssJh8NY83Em+iIe9QvNF6M4bNXQM8nVDw7w1Sm4hU6O1YwAC/ZkkB+NyEHv9si 8o45tDGMPMIIjuFmtDe8y4TFZOEZx2NZGuzyBHJbNdENoZio31SoVRFOMpejRNhXLTJz 51zA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=oOYaPFS0snZF0jcIguSnt+fI0DE93HtI6yWk8Dzmu68=; b=M/tvpOpRANYtX58MUq57OktRURKyIM3GkWEzD749+hBPFcUSGX0QhN8NsYTpKmLxN8 bqcrNjQ8jMxuY2sdOGUSD2+5m6ISJ3kcRaTNTBHg+9SSFBiHJ1Tbt6kxvL0iaYfr5C9Q 0eOh9T3m8Gj7nFO7Dkh99s1m2dF3oYbj1M9xtozCmmTmSZnT1vnnSF31sAGArOyXw2r4 lZnOSNMgIBCfClPCE3V3R47e2qUWSmIIkpdYNm3OXIHM+ZYn8O5mbnrPm8iAELiYczfv rGNKM5iSsbWkfrcxnpcyrXPu2rJ6vackBZh1GNaM9Tr8oBDbEUAkGAR8yS0jUcakTzUR UVEA== X-Gm-Message-State: AOAM532Br9ZpvZoAJ8PXOGiztoVKJOFLimq2SPC01TR7EPVGFu7hZLyL 7CuWl8mp0Ywm36pgX+pZQ2pXruTBPkYr X-Google-Smtp-Source: ABdhPJzOEcqt367m/DzJYyZtpgROkKDWk4Vs9yeAYDfDCMrHmKkR0RiHlT8WtrhmHQYYh4mQyeytZCHYH1ah X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:1689:b0:1d8:28f9:3ba9 with SMTP id kv9-20020a17090b168900b001d828f93ba9mr478335pjb.56.1651515466417; Mon, 02 May 2022 11:17:46 -0700 (PDT) Date: Mon, 2 May 2022 11:17:12 -0700 In-Reply-To: <20220502181714.3483177-1-zokeefe@google.com> Message-Id: <20220502181714.3483177-12-zokeefe@google.com> Mime-Version: 1.0 References: <20220502181714.3483177-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v4 11/13] selftests/vm: modularize collapse selftests From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 9835A1C0073 X-Stat-Signature: iafhxyf4zegq63a7xfew3hz4kcfhd3s6 X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=V8UoMpIJ; spf=pass (imf18.hostedemail.com: domain of 3SiBwYgcKCGUcRNHHIHJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3SiBwYgcKCGUcRNHHIHJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1651515460-203609 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Modularize the collapse action of khugepaged collapse selftests by introducing a struct collapse_context which specifies how to collapse a given memory range and the expected semantics of the collapse. This can be reused later to test other collapse contexts. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 257 +++++++++++------------- 1 file changed, 116 insertions(+), 141 deletions(-) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index 155120b67a16..c59d832fee96 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -23,6 +23,12 @@ static int hpage_pmd_nr; #define THP_SYSFS "/sys/kernel/mm/transparent_hugepage/" #define PID_SMAPS "/proc/self/smaps" +struct collapse_context { + const char *name; + void (*collapse)(const char *msg, char *p, bool expect); + bool enforce_pte_scan_limits; +}; + enum thp_enabled { THP_ALWAYS, THP_MADVISE, @@ -528,53 +534,39 @@ static void alloc_at_fault(void) munmap(p, hpage_pmd_size); } -static void collapse_full(void) +static void collapse_full(struct collapse_context *context) { void *p; p = alloc_mapping(); fill_memory(p, 0, hpage_pmd_size); - if (wait_for_scan("Collapse fully populated PTE table", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse fully populated PTE table", p, true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } -static void collapse_empty(void) +static void collapse_empty(struct collapse_context *context) { void *p; p = alloc_mapping(); - if (wait_for_scan("Do not collapse empty PTE table", p)) - fail("Timeout"); - else if (check_huge(p)) - fail("Fail"); - else - success("OK"); + context->collapse("Do not collapse empty PTE table", p, false); munmap(p, hpage_pmd_size); } -static void collapse_single_pte_entry(void) +static void collapse_single_pte_entry(struct collapse_context *context) { void *p; p = alloc_mapping(); fill_memory(p, 0, page_size); - if (wait_for_scan("Collapse PTE table with single PTE entry present", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table with single PTE entry present", p, + true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); } -static void collapse_max_ptes_none(void) +static void collapse_max_ptes_none(struct collapse_context *context) { int max_ptes_none = hpage_pmd_nr / 2; struct settings settings = default_settings; @@ -586,28 +578,23 @@ static void collapse_max_ptes_none(void) p = alloc_mapping(); fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); - if (wait_for_scan("Do not collapse with max_ptes_none exceeded", p)) - fail("Timeout"); - else if (check_huge(p)) - fail("Fail"); - else - success("OK"); + context->collapse("Maybe collapse with max_ptes_none exceeded", p, + !context->enforce_pte_scan_limits); validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); - fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); - if (wait_for_scan("Collapse with max_ptes_none PTEs empty", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); - validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); + if (context->enforce_pte_scan_limits) { + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); + context->collapse("Collapse with max_ptes_none PTEs empty", p, + true); + validate_memory(p, 0, + (hpage_pmd_nr - max_ptes_none) * page_size); + } munmap(p, hpage_pmd_size); write_settings(&default_settings); } -static void collapse_swapin_single_pte(void) +static void collapse_swapin_single_pte(struct collapse_context *context) { void *p; p = alloc_mapping(); @@ -625,18 +612,14 @@ static void collapse_swapin_single_pte(void) goto out; } - if (wait_for_scan("Collapse with swapping in single PTE entry", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse with swapping in single PTE entry", + p, true); validate_memory(p, 0, hpage_pmd_size); out: munmap(p, hpage_pmd_size); } -static void collapse_max_ptes_swap(void) +static void collapse_max_ptes_swap(struct collapse_context *context) { int max_ptes_swap = read_num("khugepaged/max_ptes_swap"); void *p; @@ -656,39 +639,34 @@ static void collapse_max_ptes_swap(void) goto out; } - if (wait_for_scan("Do not collapse with max_ptes_swap exceeded", p)) - fail("Timeout"); - else if (check_huge(p)) - fail("Fail"); - else - success("OK"); + context->collapse("Maybe collapse with max_ptes_swap exceeded", + p, !context->enforce_pte_scan_limits); validate_memory(p, 0, hpage_pmd_size); - fill_memory(p, 0, hpage_pmd_size); - printf("Swapout %d of %d pages...", max_ptes_swap, hpage_pmd_nr); - if (madvise(p, max_ptes_swap * page_size, MADV_PAGEOUT)) { - perror("madvise(MADV_PAGEOUT)"); - exit(EXIT_FAILURE); - } - if (check_swap(p, max_ptes_swap * page_size)) { - success("OK"); - } else { - fail("Fail"); - goto out; - } + if (context->enforce_pte_scan_limits) { + fill_memory(p, 0, hpage_pmd_size); + printf("Swapout %d of %d pages...", max_ptes_swap, + hpage_pmd_nr); + if (madvise(p, max_ptes_swap * page_size, MADV_PAGEOUT)) { + perror("madvise(MADV_PAGEOUT)"); + exit(EXIT_FAILURE); + } + if (check_swap(p, max_ptes_swap * page_size)) { + success("OK"); + } else { + fail("Fail"); + goto out; + } - if (wait_for_scan("Collapse with max_ptes_swap pages swapped out", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); - validate_memory(p, 0, hpage_pmd_size); + context->collapse("Collapse with max_ptes_swap pages swapped out", + p, true); + validate_memory(p, 0, hpage_pmd_size); + } out: munmap(p, hpage_pmd_size); } -static void collapse_single_pte_entry_compound(void) +static void collapse_single_pte_entry_compound(struct collapse_context *context) { void *p; @@ -710,17 +688,13 @@ static void collapse_single_pte_entry_compound(void) else fail("Fail"); - if (wait_for_scan("Collapse PTE table with single PTE mapping compound page", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table with single PTE mapping compound page", + p, true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); } -static void collapse_full_of_compound(void) +static void collapse_full_of_compound(struct collapse_context *context) { void *p; @@ -742,17 +716,12 @@ static void collapse_full_of_compound(void) else fail("Fail"); - if (wait_for_scan("Collapse PTE table full of compound pages", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table full of compound pages", p, true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } -static void collapse_compound_extreme(void) +static void collapse_compound_extreme(struct collapse_context *context) { void *p; int i; @@ -798,18 +767,14 @@ static void collapse_compound_extreme(void) else fail("Fail"); - if (wait_for_scan("Collapse PTE table full of different compound pages", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table full of different compound pages", + p, true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } -static void collapse_fork(void) +static void collapse_fork(struct collapse_context *context) { int wstatus; void *p; @@ -835,13 +800,8 @@ static void collapse_fork(void) fail("Fail"); fill_memory(p, page_size, 2 * page_size); - - if (wait_for_scan("Collapse PTE table with single page shared with parent process", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table with single page shared with parent process", + p, true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); @@ -860,7 +820,7 @@ static void collapse_fork(void) munmap(p, hpage_pmd_size); } -static void collapse_fork_compound(void) +static void collapse_fork_compound(struct collapse_context *context) { int wstatus; void *p; @@ -896,14 +856,10 @@ static void collapse_fork_compound(void) fill_memory(p, 0, page_size); write_num("khugepaged/max_ptes_shared", hpage_pmd_nr - 1); - if (wait_for_scan("Collapse PTE table full of compound pages in child", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table full of compound pages in child", + p, true); write_num("khugepaged/max_ptes_shared", - default_settings.khugepaged.max_ptes_shared); + default_settings.khugepaged.max_ptes_shared); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); @@ -922,7 +878,7 @@ static void collapse_fork_compound(void) munmap(p, hpage_pmd_size); } -static void collapse_max_ptes_shared() +static void collapse_max_ptes_shared(struct collapse_context *context) { int max_ptes_shared = read_num("khugepaged/max_ptes_shared"); int wstatus; @@ -957,28 +913,22 @@ static void collapse_max_ptes_shared() else fail("Fail"); - if (wait_for_scan("Do not collapse with max_ptes_shared exceeded", p)) - fail("Timeout"); - else if (!check_huge(p)) - success("OK"); - else - fail("Fail"); - - printf("Trigger CoW on page %d of %d...", - hpage_pmd_nr - max_ptes_shared, hpage_pmd_nr); - fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared) * page_size); - if (!check_huge(p)) - success("OK"); - else - fail("Fail"); - - - if (wait_for_scan("Collapse with max_ptes_shared PTEs shared", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Maybe collapse with max_ptes_shared exceeded", + p, !context->enforce_pte_scan_limits); + + if (context->enforce_pte_scan_limits) { + printf("Trigger CoW on page %d of %d...", + hpage_pmd_nr - max_ptes_shared, hpage_pmd_nr); + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared) * + page_size); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + context->collapse("Collapse with max_ptes_shared PTEs shared", + p, true); + } validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); @@ -997,8 +947,27 @@ static void collapse_max_ptes_shared() munmap(p, hpage_pmd_size); } +static void khugepaged_collapse(const char *msg, char *p, bool expect) +{ + if (wait_for_scan(msg, p)) + fail("Timeout"); + else if (check_huge(p) == expect) + success("OK"); + else + fail("Fail"); +} + int main(void) { + struct collapse_context contexts[] = { + { + .name = "khugepaged", + .collapse = &khugepaged_collapse, + .enforce_pte_scan_limits = true, + }, + }; + int i; + setbuf(stdout, NULL); page_size = getpagesize(); @@ -1014,18 +983,24 @@ int main(void) adjust_settings(); alloc_at_fault(); - collapse_full(); - collapse_empty(); - collapse_single_pte_entry(); - collapse_max_ptes_none(); - collapse_swapin_single_pte(); - collapse_max_ptes_swap(); - collapse_single_pte_entry_compound(); - collapse_full_of_compound(); - collapse_compound_extreme(); - collapse_fork(); - collapse_fork_compound(); - collapse_max_ptes_shared(); + + for (i = 0; i < sizeof(contexts) / sizeof(contexts[0]); ++i) { + struct collapse_context *c = &contexts[i]; + + printf("\n*** Testing context: %s ***\n", c->name); + collapse_full(c); + collapse_empty(c); + collapse_single_pte_entry(c); + collapse_max_ptes_none(c); + collapse_swapin_single_pte(c); + collapse_max_ptes_swap(c); + collapse_single_pte_entry_compound(c); + collapse_full_of_compound(c); + collapse_compound_extreme(c); + collapse_fork(c); + collapse_fork_compound(c); + collapse_max_ptes_shared(c); + } restore_settings(0); } From patchwork Mon May 2 18:17:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12834616 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CA75C433FE for ; Mon, 2 May 2022 18:17:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3EB956B0085; Mon, 2 May 2022 14:17:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3744C6B0087; Mon, 2 May 2022 14:17:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1EF3A6B0088; Mon, 2 May 2022 14:17:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 0E4816B0085 for ; Mon, 2 May 2022 14:17:50 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D7C902AB36 for ; Mon, 2 May 2022 18:17:49 +0000 (UTC) X-FDA: 79421611458.14.A2697CD Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf08.hostedemail.com (Postfix) with ESMTP id 8934116006E for ; Mon, 2 May 2022 18:17:40 +0000 (UTC) Received: by mail-pf1-f201.google.com with SMTP id z19-20020a62d113000000b0050d183adf6fso8396615pfg.19 for ; Mon, 02 May 2022 11:17:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=FQZlj+shHBqj+OTBNHuQzomzjyvobuslZcU+fXrDTXU=; b=bbBcXnrId5+eS82elMPTeW8C3MELAa7HLqJQS8MUD9a1khGayaKzZkihu8ROwUw0nt ZgYfTSzO3ZJlq6WGn4rk6vOoXB/2aO/x/l0sVtfZDPwrysbnhTKmen3YtrFOPgJhgTEY 7Y8lVcfbtrYDHr8bm0db50Bs7VaiiyBwfdlfCW4wzda/rFYg2xRi/V2MO02qU9AaSs7k JFFfqroRlKfeE8fKgziMDGIUoMF33feTF+8q+yks2HYFDNXDS+vCLh5CIIQ5eT6qIJ7k XhD3iLVIf8QiNR4lT9cn045xZRdeRP+AY/G4LRgihW0B8sz2VEHVFt400gXKsQ1IjrHp XF7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=FQZlj+shHBqj+OTBNHuQzomzjyvobuslZcU+fXrDTXU=; b=bSEBUKLYJeQqKEzu6C9wdN6pGaUVldoY2NqNj02r8N0wKfjyKzeC+Z6+EtQpug/zWd OTBT5bTf7E8Ko+xOYFx/zc5dT2hn17IsiphsZj9qiDF4nSpwpPD0JJlnWd0Nb2A/EuhV 5oD8kbfQhX4U2MEvyuMe/9rz9MlVnSWsMcwhiQG4XuxUfAoa/1r1dH1gakYydwXYpO84 k+jFuXgA9lTuXQRA3XihdCSsfMInnP9b7O9oBMZ9AKDIqeO0ATKKtfUwVnGwM8ZgZJpx C5kv5wEDcLoCg6AMxebF0WPlEcxlocmefDgnhE6i5JaBSnFc8tkMY2yP6gjkGNezW1cv 6CpA== X-Gm-Message-State: AOAM532UKvnbhR4Q8Yoav2yYRM1nwzUIaDQZ+97EWv1Oi8TVHIjyKYi5 Y5VU5/ZV1zmDcg87E5LOhbPhxFYM4zh/ X-Google-Smtp-Source: ABdhPJwNAPg8RVwWvf/D9U8DgZ9nABnrI29aqS13OG0Nw7crhLOMLmh++CH3Qr+ISayYcfKGc+0EJoDjCDiR X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:903:2406:b0:158:72dc:2d73 with SMTP id e6-20020a170903240600b0015872dc2d73mr13185287plo.46.1651515468383; Mon, 02 May 2022 11:17:48 -0700 (PDT) Date: Mon, 2 May 2022 11:17:13 -0700 In-Reply-To: <20220502181714.3483177-1-zokeefe@google.com> Message-Id: <20220502181714.3483177-13-zokeefe@google.com> Mime-Version: 1.0 References: <20220502181714.3483177-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v4 12/13] selftests/vm: add MADV_COLLAPSE collapse context to selftests From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=bbBcXnrI; spf=pass (imf08.hostedemail.com: domain of 3TCBwYgcKCGceTPJJKJLTTLQJ.HTRQNSZc-RRPaFHP.TWL@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3TCBwYgcKCGceTPJJKJLTTLQJ.HTRQNSZc-RRPaFHP.TWL@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 8934116006E X-Stat-Signature: bcjx5yxkes4gap5g7657xsgzrgh4ca1o X-HE-Tag: 1651515460-521907 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add MADV_COLLAPSE selftests. Extend struct collapse_context to support context initialization/cleanup. This is used by madvise collapse context to "disable" and "enable" khugepaged, since it would otherwise interfere with the tests. The mechanism used to "disable" khugepaged is a hack: it sets /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs to a large value and feeds khugepaged enough suitable VMAs/pages to keep khugepaged sleeping for the duration of the madvise collapse tests. Since khugepaged is woken when this file is written, enough VMAs must be queued to put khugepaged back to sleep when the tests write to this file in write_settings(). Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 133 ++++++++++++++++++++++-- 1 file changed, 125 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index c59d832fee96..e0ccc9443f78 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -14,17 +14,23 @@ #ifndef MADV_PAGEOUT #define MADV_PAGEOUT 21 #endif +#ifndef MADV_COLLAPSE +#define MADV_COLLAPSE 25 +#endif #define BASE_ADDR ((void *)(1UL << 30)) static unsigned long hpage_pmd_size; static unsigned long page_size; static int hpage_pmd_nr; +static int num_khugepaged_wakeups; #define THP_SYSFS "/sys/kernel/mm/transparent_hugepage/" #define PID_SMAPS "/proc/self/smaps" struct collapse_context { const char *name; + bool (*init_context)(void); + bool (*cleanup_context)(void); void (*collapse)(const char *msg, char *p, bool expect); bool enforce_pte_scan_limits; }; @@ -264,6 +270,17 @@ static void write_num(const char *name, unsigned long num) } } +/* + * Use this macro instead of write_settings inside tests, and should + * be called at most once per callsite. + * + * Hack to statically count the number of times khugepaged is woken up due to + * writes to + * /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs, + * and is stored in __COUNTER__. + */ +#define WRITE_SETTINGS(s) do { __COUNTER__; write_settings(s); } while (0) + static void write_settings(struct settings *settings) { struct khugepaged_settings *khugepaged = &settings->khugepaged; @@ -332,7 +349,7 @@ static void adjust_settings(void) { printf("Adjust settings..."); - write_settings(&default_settings); + WRITE_SETTINGS(&default_settings); success("OK"); } @@ -440,20 +457,25 @@ static bool check_swap(void *addr, unsigned long size) return swap; } -static void *alloc_mapping(void) +static void *alloc_mapping_at(void *at, size_t size) { void *p; - p = mmap(BASE_ADDR, hpage_pmd_size, PROT_READ | PROT_WRITE, - MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); - if (p != BASE_ADDR) { - printf("Failed to allocate VMA at %p\n", BASE_ADDR); + p = mmap(at, size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, + -1, 0); + if (p != at) { + printf("Failed to allocate VMA at %p\n", at); exit(EXIT_FAILURE); } return p; } +static void *alloc_mapping(void) +{ + return alloc_mapping_at(BASE_ADDR, hpage_pmd_size); +} + static void fill_memory(int *p, unsigned long start, unsigned long end) { int i; @@ -573,7 +595,7 @@ static void collapse_max_ptes_none(struct collapse_context *context) void *p; settings.khugepaged.max_ptes_none = max_ptes_none; - write_settings(&settings); + WRITE_SETTINGS(&settings); p = alloc_mapping(); @@ -591,7 +613,7 @@ static void collapse_max_ptes_none(struct collapse_context *context) } munmap(p, hpage_pmd_size); - write_settings(&default_settings); + WRITE_SETTINGS(&default_settings); } static void collapse_swapin_single_pte(struct collapse_context *context) @@ -947,6 +969,87 @@ static void collapse_max_ptes_shared(struct collapse_context *context) munmap(p, hpage_pmd_size); } +static void madvise_collapse(const char *msg, char *p, bool expect) +{ + int ret; + + printf("%s...", msg); + /* Sanity check */ + if (check_huge(p)) { + printf("Unexpected huge page\n"); + exit(EXIT_FAILURE); + } + + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + ret = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + if (((bool)ret) == expect) + fail("Fail: Bad return value"); + else if (check_huge(p) != expect) + fail("Fail: check_huge()"); + else + success("OK"); +} + +static struct khugepaged_disable_state { + void *p; + size_t map_size; +} khugepaged_disable_state; + +static bool disable_khugepaged(void) +{ + /* + * Hack to "disable" khugepaged by setting + * /transparent_hugepage/khugepaged/scan_sleep_millisecs to some large + * value, then feeding it enough suitable VMAs to scan and subsequently + * sleep. + * + * khugepaged is woken up on writes to + * /transparent_hugepage/khugepaged/scan_sleep_millisecs, so care must + * be taken to not inadvertently wake khugepaged in these tests. + * + * Feed khugepaged 1 hugepage-sized VMA to scan and sleep on, then + * N more for each time khugepaged would be woken up. + */ + size_t map_size = (num_khugepaged_wakeups + 1) * hpage_pmd_size; + void *p; + bool ret = true; + int full_scans; + int timeout = 6; /* 3 seconds */ + + default_settings.khugepaged.scan_sleep_millisecs = 1000 * 60 * 10; + default_settings.khugepaged.pages_to_scan = 1; + write_settings(&default_settings); + + p = alloc_mapping_at(((char *)BASE_ADDR) + (1UL << 30), map_size); + fill_memory(p, 0, map_size); + + full_scans = read_num("khugepaged/full_scans") + 2; + + printf("disabling khugepaged..."); + while (timeout--) { + if (read_num("khugepaged/full_scans") >= full_scans) { + fail("Fail"); + ret = false; + break; + } + printf("."); + usleep(TICK); + } + success("OK"); + khugepaged_disable_state.p = p; + khugepaged_disable_state.map_size = map_size; + return ret; +} + +static bool enable_khugepaged(void) +{ + printf("enabling khugepaged..."); + munmap(khugepaged_disable_state.p, khugepaged_disable_state.map_size); + write_settings(&saved_settings); + success("OK"); + return true; +} + static void khugepaged_collapse(const char *msg, char *p, bool expect) { if (wait_for_scan(msg, p)) @@ -962,9 +1065,18 @@ int main(void) struct collapse_context contexts[] = { { .name = "khugepaged", + .init_context = NULL, + .cleanup_context = NULL, .collapse = &khugepaged_collapse, .enforce_pte_scan_limits = true, }, + { + .name = "madvise", + .init_context = &disable_khugepaged, + .cleanup_context = &enable_khugepaged, + .collapse = &madvise_collapse, + .enforce_pte_scan_limits = false, + }, }; int i; @@ -973,6 +1085,7 @@ int main(void) page_size = getpagesize(); hpage_pmd_size = read_num("hpage_pmd_size"); hpage_pmd_nr = hpage_pmd_size / page_size; + num_khugepaged_wakeups = __COUNTER__; default_settings.khugepaged.max_ptes_none = hpage_pmd_nr - 1; default_settings.khugepaged.max_ptes_swap = hpage_pmd_nr / 8; @@ -988,6 +1101,8 @@ int main(void) struct collapse_context *c = &contexts[i]; printf("\n*** Testing context: %s ***\n", c->name); + if (c->init_context && !c->init_context()) + continue; collapse_full(c); collapse_empty(c); collapse_single_pte_entry(c); @@ -1000,6 +1115,8 @@ int main(void) collapse_fork(c); collapse_fork_compound(c); collapse_max_ptes_shared(c); + if (c->cleanup_context && !c->cleanup_context()) + break; } restore_settings(0); From patchwork Mon May 2 18:17:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12834617 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 938BFC433EF for ; Mon, 2 May 2022 18:17:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2DBC76B0087; Mon, 2 May 2022 14:17:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 264B56B0088; Mon, 2 May 2022 14:17:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 08EC26B0089; Mon, 2 May 2022 14:17:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id EEF136B0087 for ; Mon, 2 May 2022 14:17:51 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id D2B818121C for ; Mon, 2 May 2022 18:17:51 +0000 (UTC) X-FDA: 79421611542.17.64E3413 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf15.hostedemail.com (Postfix) with ESMTP id B47D3A0078 for ; Mon, 2 May 2022 18:17:43 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id w24-20020a170902a71800b0015d00267d74so6874562plq.6 for ; Mon, 02 May 2022 11:17:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=NzNBsVmMl2I2JJIrs2ROjNfXGNBNgi3t9j2NCCw70/0=; b=ePgfisEE09svxQLCUOdo4tc8RvZZzBb4NkcRnrQNMTBrEo7Gi4MfGK7Fzll+hshXK/ 7UETFegsOnqf9gpTC9MmI1JYDEjMvCh02/fn7y28d3BPIpTzRmqBaUYlmrjhLLDwzMsV kF+suRoM905oxoDGQGmhIizer1J0s2IG1BWrY/cg3vAoV9PlWlrwtesRXqnParZMkbJG gPvdq+KJmx2H+stvo4oWr+NPK3xfnoimv6LLg8T5t6ykmfA+YhEaCSt+ZjPPlHJQaA8d /L2ttSdyBDq84ccBx8a2WvesdW52ALxrBbT5xijhPJUP2I1HfRJrjr7DBww9+NkoNnqU 0N4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=NzNBsVmMl2I2JJIrs2ROjNfXGNBNgi3t9j2NCCw70/0=; b=x33iTxYa6SCvJtFsU1iRk/Z/FOHCJ9xuFef/88k1zSFiEcH2YxeKGyX3kL8Rnj9pSk BJmOLOcy+B+Dm9kRkjpwyrhutIP8GLLWDC2hRPN4WWQZFsrbtVvBUxp1SEQw67OQDUkI 9k60g/ZCApqxednpsT9xgkTComEWhvmVedlEK2bhzwHPvR2420QEZYkrI8K+IjEwLP69 7DsyjkY8lbjPrpBQywyipeyIMenLySb4Nof/vvWgQ5W4XLChViGdAiYpcaBPIgCH232N 1l+VgzTO7zXA+1ikS3QUBbS+Uk5XMDQ6Va8A1jHInp5afC7IIMOzO/4p55aRELw/FVMh o4/Q== X-Gm-Message-State: AOAM531IsUCX5yoGEzCu/73V7PkDTIokVQb7OwkQ7nftYzioQ8XBEx/n VOigFRF+GcIE3p8xZ/8apIGYaPlPDoTW X-Google-Smtp-Source: ABdhPJxA28rWCQq0W5PSDMBmnQlLal/u/OvKvxcmlOZUdT7C72vvBz0MpjLU6EP6Fwr8SVArALIl3JJqNc8V X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:ecc7:b0:15e:8685:77d with SMTP id a7-20020a170902ecc700b0015e8685077dmr13366912plh.20.1651515470536; Mon, 02 May 2022 11:17:50 -0700 (PDT) Date: Mon, 2 May 2022 11:17:14 -0700 In-Reply-To: <20220502181714.3483177-1-zokeefe@google.com> Message-Id: <20220502181714.3483177-14-zokeefe@google.com> Mime-Version: 1.0 References: <20220502181714.3483177-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v4 13/13] selftests/vm: add test to verify recollapse of THPs From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Queue-Id: B47D3A0078 X-Stat-Signature: e9cewkmcn5nqjfn5aw4iq69yqziqb8g6 Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ePgfisEE; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of 3TiBwYgcKCGkgVRLLMLNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3TiBwYgcKCGkgVRLLMLNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--zokeefe.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1651515463-501460 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add selftest specific to madvise collapse context that tests MADV_COLLAPSE is "successful" if a hugepage-algined/sized region is already pmd-mapped. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 32 +++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index e0ccc9443f78..c36d04218083 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -969,6 +969,32 @@ static void collapse_max_ptes_shared(struct collapse_context *context) munmap(p, hpage_pmd_size); } +static void madvise_collapse_existing_thps(void) +{ + void *p; + int err; + + p = alloc_mapping(); + fill_memory(p, 0, hpage_pmd_size); + + printf("Collapse fully populated PTE table..."); + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + err = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + if (err == 0 && check_huge(p)) { + success("OK"); + printf("Re-collapse PMD-mapped hugepage"); + err = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + if (err == 0 && check_huge(p)) + success("OK"); + else + fail("Fail"); + } else { + fail("Fail"); + } + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); +} + static void madvise_collapse(const char *msg, char *p, bool expect) { int ret; @@ -1097,6 +1123,7 @@ int main(void) alloc_at_fault(); + /* Shared tests */ for (i = 0; i < sizeof(contexts) / sizeof(contexts[0]); ++i) { struct collapse_context *c = &contexts[i]; @@ -1119,5 +1146,10 @@ int main(void) break; } + /* madvise-specific tests */ + disable_khugepaged(); + madvise_collapse_existing_thps(); + enable_khugepaged(); + restore_settings(0); }