From patchwork Wed Jul 6 23:59:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908925 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34D2AC43334 for ; Thu, 7 Jul 2022 00:06:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 925426B0073; Wed, 6 Jul 2022 20:06:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8ACC36B0074; Wed, 6 Jul 2022 20:06:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7263D6B0075; Wed, 6 Jul 2022 20:06:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 628326B0073 for ; Wed, 6 Jul 2022 20:06:07 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 39F68609DD for ; Thu, 7 Jul 2022 00:06:07 +0000 (UTC) X-FDA: 79658361174.06.A85241F Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf30.hostedemail.com (Postfix) with ESMTP id DEB4A8005A for ; Thu, 7 Jul 2022 00:06:06 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id k21-20020aa78215000000b005283ff3d0c3so4272627pfi.1 for ; Wed, 06 Jul 2022 17:06:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=jUyi07UYBFL66R9xdQey/qmaEetUv9xl7ILS1rXt9pQ=; b=KqxvE6AQgzVCxqIAjNKynfkKICku5M6RMUufXpO/ZO/VM/OeC88xda8DXyH+W3qveI ZSljt08z6fAnMtc+ggmK5sBV4V0NqP/sPgtUequM9Tw1wTXK+XlMUEkw9c4JY+KKtgh2 sl+VyFAw8hXJclLdCApizI3bAtQ6sHamMfAXTcsgBhSOvaV+wrEDiOdOfd91f24DQtdm mt223ucS6YYBeepfoel4zQACGMxFoZptSJuvHIQnpcc6iF/OJjzJTiz6ZEqV23NsA6cj GLSAvwzDy1YWe+AQAN/JqAqlECrYBe+eZb/XLNKee0J9vUz2kBFS64HCsQHhO/Bxq46j dxDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=jUyi07UYBFL66R9xdQey/qmaEetUv9xl7ILS1rXt9pQ=; b=ff0q9+FMvBmNvvzI0NwJ6DYMRyVUDULaYyk2xfDdCE5dKsXoPCrqVYxY8KgZlOzG3V vgQQepPQVdFD4uihwTZhJcby5u8gkXLcWOj2fSDuNrvfdqoEqXaIzocFs69PBl5jopW2 To6R92xyQSPJJPyRQNaBiUJfHeiTv3UB71+ZZWni5QeVwFHHNEaCXwhUO8grnrI2pPXE PQnEF88WE5X0pqnJFfAhJQL8dtyv+9AIerBQSMhneM/Os8IACkKSKgWycaVUZ0mKRc3U XCrOjQUirmcC5CwsYp5YTAC4taLxL59r/Az/RT0nA0qow6nvkgo2B2WoArRTA4H5X8ub q2tQ== X-Gm-Message-State: AJIora+By4OSNmgqdjmpDhAkq+mFSf6v1+MGzdSvji92dgFOsEOQfAmH n/zGb6VTxM9atMPQ7ahLtXUdY/FeLouY X-Google-Smtp-Source: AGRyM1sSXEOVyLnOMlSD0BbzVJAwq9RwGMS9iTW9EfuQjs+Fpwijwm3kgXRWcm6jf8eyWwR5pcyj14Qg2Pfa X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:14c7:b0:525:89c1:35fb with SMTP id w7-20020a056a0014c700b0052589c135fbmr49511605pfu.36.1657152365762; Wed, 06 Jul 2022 17:06:05 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:19 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-2-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 01/18] mm/khugepaged: remove redundant transhuge_vma_suitable() check From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152366; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jUyi07UYBFL66R9xdQey/qmaEetUv9xl7ILS1rXt9pQ=; b=q+Dfkcv73BfD6inm60HE4L5HNhfCKytiUsejP7XlLTvlSdOpy5b3kzgFFWWjZ/IFY+xQ9N NFovnCTL9rhQObLjq433ui/cj5R8X7JLApIeGKXBTA3xqCHHBBpAu8Ib3zHtp2U3zupmqn cmrRopqdPxJRKy7b0fntJybDuLHHRSU= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=KqxvE6AQ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 3bSPGYgcKCOgjYUOOPOQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3bSPGYgcKCOgjYUOOPOQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--zokeefe.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152366; a=rsa-sha256; cv=none; b=E1ja+tYeYLs847uHRft0foNmJs3lK4CGQgygdsAEzbNoxuB8bTRKiWBdC7QLyWNxxUUjg9 P4GIGBwDm3srtonvCGoPftibxq63v61O/ykEzVObS11kUHsOh1An+JTK5W1bz2BD4TUEr4 VMMqLBPP63T9ZCn/7lDog93bmkSRcKk= X-Rspamd-Server: rspam11 X-Rspam-User: X-Stat-Signature: keb5mgw41ji57ee3gg853shr76zo7u3i X-Rspamd-Queue-Id: DEB4A8005A Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=KqxvE6AQ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 3bSPGYgcKCOgjYUOOPOQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3bSPGYgcKCOgjYUOOPOQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--zokeefe.bounces.google.com X-HE-Tag: 1657152366-795940 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: transhuge_vma_suitable() is called twice in hugepage_vma_revalidate() path. Remove the first check, and rely on the second check inside hugepage_vma_check(). Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index cfe231c5958f..5269d15e20f6 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -918,8 +918,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, if (!vma) return SCAN_VMA_NULL; - if (!transhuge_vma_suitable(vma, address)) - return SCAN_ADDRESS_RANGE; if (!hugepage_vma_check(vma, vma->vm_flags, false, false)) return SCAN_VMA_CHECK; /* From patchwork Wed Jul 6 23:59:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908926 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2A6BCCA480 for ; Thu, 7 Jul 2022 00:06:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BBE1D6B0074; Wed, 6 Jul 2022 20:06:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AA9A88E0001; Wed, 6 Jul 2022 20:06:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 810AE6B0078; Wed, 6 Jul 2022 20:06:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 680D56B0074 for ; Wed, 6 Jul 2022 20:06:08 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3F15334976 for ; Thu, 7 Jul 2022 00:06:08 +0000 (UTC) X-FDA: 79658361216.22.AAEF8E2 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf30.hostedemail.com (Postfix) with ESMTP id DB9A98005B for ; Thu, 7 Jul 2022 00:06:07 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id k21-20020aa78215000000b005283ff3d0c3so4272693pfi.1 for ; Wed, 06 Jul 2022 17:06:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=FO/Y8vmDPttuJQWcLWwVwXxjBqELpqfb92fkkENpOFo=; b=cXmiHuo93aqDVHARyXXaCGvMImaW9/ZiCgyDL+/aN/Gml9sZ/Bl/m4E5U89gH8vXo9 cQ9QhGxeR3IWGGu/JsVaVTrqqgYQ8GSLIKkqQ8QyEUCRT3JlsDnvj9FFT975JSeKPxWv 6AyjY/zX3TwN/LYHJ8KY/J7dGFzYoYwJAWbmTt7AmTrPN+iOdUVi/Cy9pbjaOroRExBp 7Gy1sBXxTMkSF3uwszku0hFwlWCsi5zKGdM1F4oC3Ko/pF44r+KxBa79Q5mkIMOsuhVr LzF1XY23wOnD0g0dHuFhiYE85QpeZPkBKRw9xsxOJpQD0iQ3IpuJ6/yl0jQtIn7MhTYX sf8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=FO/Y8vmDPttuJQWcLWwVwXxjBqELpqfb92fkkENpOFo=; b=QjLkOMFTsDY2SJpsf/hNpXLKqlYb3AOIUXcc9YYMAAG+AaKNB/BFfJv+Wxi61xSDcc nG9KBqAQZ7tKxkUdtLwtXM6Rcb0AI0UFlBQOLBsxam/m5/twWszuUXGijUSP2MhmZJTN OdW4+fVJ3EBULP0UZUdiCWxNRFlMgDGjGrrkxhgLvt2B82OMTYU46am+mIbzO0vFl1MU Aq28UmL2qHTnUQ8kWoc3PLN7s9PD291sBTGKTGrczcMyRwm7BI/iMEKitGwPDaZMBVLT +DNiK8x/0HMNQ6Cq6uipWKAML+jSUvz3WXU8Eos8C1lxb+gfaJa37B3ZqS0bjUQJBGJT 9sow== X-Gm-Message-State: AJIora8LE+0y26PdHs+P8ADiZJxHwK+/4s1nsK/ZJwm6XKjJYELxmuKi oN6AEfL/KZOYYz7hI5zfdqtcW2Rgybz6 X-Google-Smtp-Source: AGRyM1tYwJYMJF+0N+rveILbhklujt2+i2O22F8N+inDBk36ANBbyP/uf37xkRoAzKCYU2STpbIktaSee/AJ X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:aa7:9d11:0:b0:528:3d4e:f8d7 with SMTP id k17-20020aa79d11000000b005283d4ef8d7mr29825911pfp.19.1657152367190; Wed, 06 Jul 2022 17:06:07 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:20 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-3-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 02/18] mm: khugepaged: don't carry huge page to the next loop for !CONFIG_NUMA From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152367; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FO/Y8vmDPttuJQWcLWwVwXxjBqELpqfb92fkkENpOFo=; b=RYIeldRBM2lTn7uKZ56M0HZzCTgrVhwp0TgqdGaeYXz63BKuPjWpWwqfTzoVWdjmI7WOfI Guw/CqpbN/KiHj/0wLIPjp93Zo6kUnUd9SIlOvwLeqaJX4fUGO3oUtNxbXgKT4Itwy8nPE RLkwNhpUYALKGUE+4nEkltoET7bTDpI= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=cXmiHuo9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 3byPGYgcKCOolaWQQRQSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3byPGYgcKCOolaWQQRQSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--zokeefe.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152367; a=rsa-sha256; cv=none; b=4UUSzks4cgDbd/00e9WSwtcGtjrKesmSDrPnG9o7JzJ7WzWGdjb7kf39DBOdwddyXWrU1H 0q+e1XFv6EKYhlcQtXH0jJEkbFJ+DoQkdYeti6Vacu8mUVLXflUGCBRyxQ5w6GyT+CWXVr xkBtFcBRf46r8vq1G5xK3ffH/L7v1lE= X-Rspamd-Server: rspam11 X-Rspam-User: X-Stat-Signature: zkzn3ueke46sz8c8oj6yb9u43i4w6e5c X-Rspamd-Queue-Id: DB9A98005B Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=cXmiHuo9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 3byPGYgcKCOolaWQQRQSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3byPGYgcKCOolaWQQRQSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--zokeefe.bounces.google.com X-HE-Tag: 1657152367-305688 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yang Shi The khugepaged has optimization to reduce huge page allocation calls for !CONFIG_NUMA by carrying the allocated but failed to collapse huge page to the next loop. CONFIG_NUMA doesn't do so since the next loop may try to collapse huge page from a different node, so it doesn't make too much sense to carry it. But when NUMA=n, the huge page is allocated by khugepaged_prealloc_page() before scanning the address space, so it means huge page may be allocated even though there is no suitable range for collapsing. Then the page would be just freed if khugepaged already made enough progress. This could make NUMA=n run have 5 times as much thp_collapse_alloc as NUMA=y run. This problem actually makes things worse due to the way more pointless THP allocations and makes the optimization pointless. This could be fixed by carrying the huge page across scans, but it will complicate the code further and the huge page may be carried indefinitely. But if we take one step back, the optimization itself seems not worth keeping nowadays since: * Not too many users build NUMA=n kernel nowadays even though the kernel is actually running on a non-NUMA machine. Some small devices may run NUMA=n kernel, but I don't think they actually use THP. * Since commit 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists"), THP could be cached by pcp. This actually somehow does the job done by the optimization. Signed-off-by: Yang Shi Signed-off-by: Zach O'Keefe Co-developed-by: Peter Xu Signed-off-by: Peter Xu Cc: Hugh Dickins Cc: "Kirill A. Shutemov" --- mm/khugepaged.c | 120 +++++++++++------------------------------------- 1 file changed, 26 insertions(+), 94 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5269d15e20f6..196eaadbf415 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -796,29 +796,16 @@ static int khugepaged_find_target_node(void) last_khugepaged_target_node = target_node; return target_node; } - -static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) +#else +static int khugepaged_find_target_node(void) { - if (IS_ERR(*hpage)) { - if (!*wait) - return false; - - *wait = false; - *hpage = NULL; - khugepaged_alloc_sleep(); - } else if (*hpage) { - put_page(*hpage); - *hpage = NULL; - } - - return true; + return 0; } +#endif static struct page * khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) { - VM_BUG_ON_PAGE(*hpage, *hpage); - *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); if (unlikely(!*hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); @@ -830,74 +817,6 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) count_vm_event(THP_COLLAPSE_ALLOC); return *hpage; } -#else -static int khugepaged_find_target_node(void) -{ - return 0; -} - -static inline struct page *alloc_khugepaged_hugepage(void) -{ - struct page *page; - - page = alloc_pages(alloc_hugepage_khugepaged_gfpmask(), - HPAGE_PMD_ORDER); - if (page) - prep_transhuge_page(page); - return page; -} - -static struct page *khugepaged_alloc_hugepage(bool *wait) -{ - struct page *hpage; - - do { - hpage = alloc_khugepaged_hugepage(); - if (!hpage) { - count_vm_event(THP_COLLAPSE_ALLOC_FAILED); - if (!*wait) - return NULL; - - *wait = false; - khugepaged_alloc_sleep(); - } else - count_vm_event(THP_COLLAPSE_ALLOC); - } while (unlikely(!hpage) && likely(hugepage_flags_enabled())); - - return hpage; -} - -static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) -{ - /* - * If the hpage allocated earlier was briefly exposed in page cache - * before collapse_file() failed, it is possible that racing lookups - * have not yet completed, and would then be unpleasantly surprised by - * finding the hpage reused for the same mapping at a different offset. - * Just release the previous allocation if there is any danger of that. - */ - if (*hpage && page_count(*hpage) > 1) { - put_page(*hpage); - *hpage = NULL; - } - - if (!*hpage) - *hpage = khugepaged_alloc_hugepage(wait); - - if (unlikely(!*hpage)) - return false; - - return true; -} - -static struct page * -khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) -{ - VM_BUG_ON(!*hpage); - - return *hpage; -} -#endif /* * If mmap_lock temporarily dropped, revalidate vma @@ -1148,8 +1067,10 @@ static void collapse_huge_page(struct mm_struct *mm, out_up_write: mmap_write_unlock(mm); out_nolock: - if (!IS_ERR_OR_NULL(*hpage)) + if (!IS_ERR_OR_NULL(*hpage)) { mem_cgroup_uncharge(page_folio(*hpage)); + put_page(*hpage); + } trace_mm_collapse_huge_page(mm, isolated, result); return; } @@ -1951,8 +1872,10 @@ static void collapse_file(struct mm_struct *mm, unlock_page(new_page); out: VM_BUG_ON(!list_empty(&pagelist)); - if (!IS_ERR_OR_NULL(*hpage)) + if (!IS_ERR_OR_NULL(*hpage)) { mem_cgroup_uncharge(page_folio(*hpage)); + put_page(*hpage); + } /* TODO: tracepoints */ } @@ -2197,10 +2120,7 @@ static void khugepaged_do_scan(void) lru_add_drain_all(); - while (progress < pages) { - if (!khugepaged_prealloc_page(&hpage, &wait)) - break; - + while (true) { cond_resched(); if (unlikely(kthread_should_stop() || try_to_freeze())) @@ -2216,10 +2136,22 @@ static void khugepaged_do_scan(void) else progress = pages; spin_unlock(&khugepaged_mm_lock); - } - if (!IS_ERR_OR_NULL(hpage)) - put_page(hpage); + if (progress >= pages) + break; + + if (IS_ERR(hpage)) { + /* + * If fail to allocate the first time, try to sleep for + * a while. When hit again, cancel the scan. + */ + if (!wait) + break; + wait = false; + hpage = NULL; + khugepaged_alloc_sleep(); + } + } } static bool khugepaged_should_wakeup(void) From patchwork Wed Jul 6 23:59:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908927 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F076C43334 for ; Thu, 7 Jul 2022 00:06:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C0C208E0002; Wed, 6 Jul 2022 20:06:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B94078E0001; Wed, 6 Jul 2022 20:06:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A35938E0002; Wed, 6 Jul 2022 20:06:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 88ADB8E0001 for ; Wed, 6 Jul 2022 20:06:10 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 682643351C for ; Thu, 7 Jul 2022 00:06:10 +0000 (UTC) X-FDA: 79658361300.15.4827164 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf19.hostedemail.com (Postfix) with ESMTP id 026C71A0041 for ; Thu, 7 Jul 2022 00:06:09 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id y21-20020aa78555000000b00528641ccfc1so3396999pfn.13 for ; Wed, 06 Jul 2022 17:06:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Yrec734XRlvntOQScv57waDOkfJ3X7qS7KF7ddycl6k=; b=F/NXMGiid9mmssmyp+20HcrDHebNU9oTKWnyamZ9qyn32hWVOh/KIJTmAjA36PMyPT 2zzYtGO+FdhoFw98qgNC7sPSVaOBlgQmqH0MCa4h5qJtJfoD4tv19W49pAZ6fhODSv6S GKLaQZfaLKVOfX11c7Tp54JCFeHUXx5eTqqjk+dV8pOsqcCgzIgpCeqLuuIwGk+iRywj 9/NBtTzB2XNQGx/AlVM6tRG9+rSKRJxp+H+UQLC24jpDdO7t36wje6IAltlVeb4vKBdO pbcNGKHmN0XBpb4l2+7tik/CFbleKSvpmTi+OxOAesjvOy/deVdktGXzm11hOBHbZX4w vTGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Yrec734XRlvntOQScv57waDOkfJ3X7qS7KF7ddycl6k=; b=UNBxyn5e3NZQt8PlJwwzXyJm67JD6H/Y0Go/5sIcZpSvV5I9SEs6KhDIvhKtr0FC9E o3Rbep0Fl8XGaoani8qQqkk6HJ+Kq+wpQW0tIdNSuC4J9ultYH5Bntx0+/wYKX+VQ39u fZjR9+Sy0oeaj5P6Im6nCHqOnYVnAOF5bOdZ90aAAFzZHWT86RcpGrSqbZAP4iceKLJU jQh7ZCXfE/gUlWYlTnBgnxEVIMEGnd1JqtqtwNgD2dvQuIOK8AQQPtx6Dh6qcs0G2smG NiMJxBfQneCF56hSVQ30tihLI1M1L5Vuwnw7Ep3SUpHXAQOT63pyomLgsa17JVx7odsp I9xg== X-Gm-Message-State: AJIora+gvjpoI0IwuOcYVAJoRG9qP7Gs1kAmBYU9zf1FTRdGnBwGmCu8 Ud5taU2VWgI0vnnKpJMoneG3doYfCRj8 X-Google-Smtp-Source: AGRyM1tmoU0pX8znJOpAqOw/ozezeF1+Ygxv5rWfNDuWxqAhiyovw1+o2TCINIExSnXw5Y+Rz5B2bPEPOusO X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:f03:b0:528:c4c7:35ac with SMTP id cr3-20020a056a000f0300b00528c4c735acmr2239759pfb.82.1657152369111; Wed, 06 Jul 2022 17:06:09 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:21 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-4-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 03/18] mm/khugepaged: add struct collapse_control From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152370; a=rsa-sha256; cv=none; b=NPjTU3+IrxnDCbLi80vsDVg3zL5vsrGkcg56dFbHoWW6XSO4IW8S4CjjMgIa5lsSojbuBA S+ddJm/+SlRW9XQOI2MjvP7kNbjLcUBHD0Xx4fGOdbQ7Q7ansSCR5J4GDN/uNVSWM7lFMa 4qPo7bPWozWlad5GkvZDVRRlrc+j4GU= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="F/NXMGii"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of 3cSPGYgcKCOwncYSSTSUccUZS.QcaZWbil-aaYjOQY.cfU@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3cSPGYgcKCOwncYSSTSUccUZS.QcaZWbil-aaYjOQY.cfU@flex--zokeefe.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152370; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Yrec734XRlvntOQScv57waDOkfJ3X7qS7KF7ddycl6k=; b=5CX8Xd6DCB3wCGf8xaNYefkD1xoyd1oP5V47n/OcOCpRCcHX7Ssaguti1Vqu8N5E8ZhPg4 Na8HjVZx15NKQ7mCD1BW5SF6tiGHU4Hn5N1sNZLdht5R1NW2cFjZErf0Rd2t7R9YhZTtTo 4IwOPrQfSx5rLus4o4FPO7QueDDvXHI= X-Stat-Signature: ci51j49ibhzoaki8oiq5fi3tc5fft863 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 026C71A0041 Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="F/NXMGii"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of 3cSPGYgcKCOwncYSSTSUccUZS.QcaZWbil-aaYjOQY.cfU@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3cSPGYgcKCOwncYSSTSUccUZS.QcaZWbil-aaYjOQY.cfU@flex--zokeefe.bounces.google.com X-HE-Tag: 1657152369-108598 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Modularize hugepage collapse by introducing struct collapse_control. This structure serves to describe the properties of the requested collapse, as well as serve as a local scratch pad to use during the collapse itself. Start by moving global per-node khugepaged statistics into this new structure. Note that this structure is still statically allocated since CONFIG_NODES_SHIFT might be arbitrary large, and stack-allocating a MAX_NUMNODES-sized array could cause -Wframe-large-than= errors. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 87 ++++++++++++++++++++++++++++--------------------- 1 file changed, 50 insertions(+), 37 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 196eaadbf415..f1ef02d9fe07 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -85,6 +85,14 @@ static struct kmem_cache *mm_slot_cache __read_mostly; #define MAX_PTE_MAPPED_THP 8 +struct collapse_control { + /* Num pages scanned per node */ + int node_load[MAX_NUMNODES]; + + /* Last target selected in khugepaged_find_target_node() */ + int last_target_node; +}; + /** * struct mm_slot - hash lookup from mm to mm_slot * @hash: hash collision list @@ -735,9 +743,12 @@ static void khugepaged_alloc_sleep(void) remove_wait_queue(&khugepaged_wait, &wait); } -static int khugepaged_node_load[MAX_NUMNODES]; -static bool khugepaged_scan_abort(int nid) +struct collapse_control khugepaged_collapse_control = { + .last_target_node = NUMA_NO_NODE, +}; + +static bool khugepaged_scan_abort(int nid, struct collapse_control *cc) { int i; @@ -749,11 +760,11 @@ static bool khugepaged_scan_abort(int nid) return false; /* If there is a count for this node already, it must be acceptable */ - if (khugepaged_node_load[nid]) + if (cc->node_load[nid]) return false; for (i = 0; i < MAX_NUMNODES; i++) { - if (!khugepaged_node_load[i]) + if (!cc->node_load[i]) continue; if (node_distance(nid, i) > node_reclaim_distance) return true; @@ -772,32 +783,31 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void) } #ifdef CONFIG_NUMA -static int khugepaged_find_target_node(void) +static int khugepaged_find_target_node(struct collapse_control *cc) { - static int last_khugepaged_target_node = NUMA_NO_NODE; int nid, target_node = 0, max_value = 0; /* find first node with max normal pages hit */ for (nid = 0; nid < MAX_NUMNODES; nid++) - if (khugepaged_node_load[nid] > max_value) { - max_value = khugepaged_node_load[nid]; + if (cc->node_load[nid] > max_value) { + max_value = cc->node_load[nid]; target_node = nid; } /* do some balance if several nodes have the same hit record */ - if (target_node <= last_khugepaged_target_node) - for (nid = last_khugepaged_target_node + 1; nid < MAX_NUMNODES; - nid++) - if (max_value == khugepaged_node_load[nid]) { + if (target_node <= cc->last_target_node) + for (nid = cc->last_target_node + 1; nid < MAX_NUMNODES; + nid++) + if (max_value == cc->node_load[nid]) { target_node = nid; break; } - last_khugepaged_target_node = target_node; + cc->last_target_node = target_node; return target_node; } #else -static int khugepaged_find_target_node(void) +static int khugepaged_find_target_node(struct collapse_control *cc) { return 0; } @@ -1075,10 +1085,9 @@ static void collapse_huge_page(struct mm_struct *mm, return; } -static int khugepaged_scan_pmd(struct mm_struct *mm, - struct vm_area_struct *vma, - unsigned long address, - struct page **hpage) +static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, struct page **hpage, + struct collapse_control *cc) { pmd_t *pmd; pte_t *pte, *_pte; @@ -1098,7 +1107,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, goto out; } - memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); + memset(cc->node_load, 0, sizeof(cc->node_load)); pte = pte_offset_map_lock(mm, pmd, address, &ptl); for (_address = address, _pte = pte; _pte < pte + HPAGE_PMD_NR; _pte++, _address += PAGE_SIZE) { @@ -1164,16 +1173,16 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, /* * Record which node the original page is from and save this - * information to khugepaged_node_load[]. + * information to cc->node_load[]. * Khugepaged will allocate hugepage from the node has the max * hit record. */ node = page_to_nid(page); - if (khugepaged_scan_abort(node)) { + if (khugepaged_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; goto out_unmap; } - khugepaged_node_load[node]++; + cc->node_load[node]++; if (!PageLRU(page)) { result = SCAN_PAGE_LRU; goto out_unmap; @@ -1224,7 +1233,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (ret) { - node = khugepaged_find_target_node(); + node = khugepaged_find_target_node(cc); /* collapse_huge_page will return with the mmap_lock released */ collapse_huge_page(mm, address, hpage, node, referenced, unmapped); @@ -1879,8 +1888,9 @@ static void collapse_file(struct mm_struct *mm, /* TODO: tracepoints */ } -static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage) +static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, + pgoff_t start, struct page **hpage, + struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -1891,7 +1901,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, present = 0; swap = 0; - memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); + memset(cc->node_load, 0, sizeof(cc->node_load)); rcu_read_lock(); xas_for_each(&xas, page, start + HPAGE_PMD_NR - 1) { if (xas_retry(&xas, page)) @@ -1916,11 +1926,11 @@ static void khugepaged_scan_file(struct mm_struct *mm, } node = page_to_nid(page); - if (khugepaged_scan_abort(node)) { + if (khugepaged_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; break; } - khugepaged_node_load[node]++; + cc->node_load[node]++; if (!PageLRU(page)) { result = SCAN_PAGE_LRU; @@ -1953,7 +1963,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - node = khugepaged_find_target_node(); + node = khugepaged_find_target_node(cc); collapse_file(mm, file, start, hpage, node); } } @@ -1961,8 +1971,9 @@ static void khugepaged_scan_file(struct mm_struct *mm, /* TODO: tracepoints */ } #else -static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage) +static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, + pgoff_t start, struct page **hpage, + struct collapse_control *cc) { BUILD_BUG(); } @@ -1973,7 +1984,8 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, - struct page **hpage) + struct page **hpage, + struct collapse_control *cc) __releases(&khugepaged_mm_lock) __acquires(&khugepaged_mm_lock) { @@ -2050,12 +2062,13 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, mmap_read_unlock(mm); ret = 1; - khugepaged_scan_file(mm, file, pgoff, hpage); + khugepaged_scan_file(mm, file, pgoff, hpage, + cc); fput(file); } else { ret = khugepaged_scan_pmd(mm, vma, khugepaged_scan.address, - hpage); + hpage, cc); } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; @@ -2111,7 +2124,7 @@ static int khugepaged_wait_event(void) kthread_should_stop(); } -static void khugepaged_do_scan(void) +static void khugepaged_do_scan(struct collapse_control *cc) { struct page *hpage = NULL; unsigned int progress = 0, pass_through_head = 0; @@ -2132,7 +2145,7 @@ static void khugepaged_do_scan(void) if (khugepaged_has_work() && pass_through_head < 2) progress += khugepaged_scan_mm_slot(pages - progress, - &hpage); + &hpage, cc); else progress = pages; spin_unlock(&khugepaged_mm_lock); @@ -2188,7 +2201,7 @@ static int khugepaged(void *none) set_user_nice(current, MAX_NICE); while (!kthread_should_stop()) { - khugepaged_do_scan(); + khugepaged_do_scan(&khugepaged_collapse_control); khugepaged_wait_work(); } From patchwork Wed Jul 6 23:59:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908928 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71362C433EF for ; Thu, 7 Jul 2022 00:06:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D6B648E0003; Wed, 6 Jul 2022 20:06:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C0B678E0001; Wed, 6 Jul 2022 20:06:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A0F208E0003; Wed, 6 Jul 2022 20:06:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8C2E78E0001 for ; Wed, 6 Jul 2022 20:06:12 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5B6842058A for ; Thu, 7 Jul 2022 00:06:12 +0000 (UTC) X-FDA: 79658361384.25.83B65A5 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf27.hostedemail.com (Postfix) with ESMTP id CEE994001B for ; Thu, 7 Jul 2022 00:06:11 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id z5-20020aa785c5000000b00527d84dfb49so5283214pfn.21 for ; Wed, 06 Jul 2022 17:06:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=MdbzD7hrQbL1sx8u1+rqVGInIjHqkOI6kny7BDsg8mc=; b=q06nKCeATyrtFEHz1MeoS96w14n9azQmK423CJJKOXOkLnDLYnPCjNzI2VkcGDLzpR wGm93VZWC5y+L7w9Q56GeNqfT33M+BrrHPf+Yavk1Pw11Am/eNa/aeHgYGTanWGykcgc oZTgD2YVyBz31MT3qwacdQW+cN8OIIv7jdyZRpkVi9ic0f7W17FOCBWlka07qaeO3xeI ayx5IUcerOohNc9WB0oNvxLVtgotoZnmGJBOIZUi9G+j5VDmthBxa1bGp8Rt5AxIS+v4 NgAQIclpiJySr/4mwWtBLNxbbJMSy6G9oz3b/8yPs90LOoGyN9ugJZ3byi2yl6YWQmwT JElQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=MdbzD7hrQbL1sx8u1+rqVGInIjHqkOI6kny7BDsg8mc=; b=qbwToxOVdg3UlQItEjJK7PxCLsIgzuISawrtMsx0rNSj7xAmIQ9FEPddQd2/XQFOVp DaRYQuAEHHpRl10ga8v5bVbrBh/q8m6rSSX8kf7eHEKhfRp3QkYk60pSkEEX655+lqq/ IW4RvnrKKSgM60YDSypk+WFapEaK7xQW0iH7i2IoyMsJfBSKKjkkdcUzj1inntiUGgCD Xb0Woocw/HALCSucsqJstl4vYMN9ZQprSOaL6IKVa98smESyGUSYRssXgZdkivYb0VgY G4hCsHQO5+LtTc7kn5E4LQT3pLdH7aVKwFv0fgTvIyrsKxp8ybpRXSXG02Prt/nP+grn RGjA== X-Gm-Message-State: AJIora+qANqfStoHfjROD2vL1qs46PyaRSxjx/Sg/nxzg8ownJxbUuHZ 0UvbcVtKxBR9uJzFLhIkQwsp77sI801q X-Google-Smtp-Source: AGRyM1sM9HozByX5Xrr6EJq1GIOphL1W0zcTicMDiGC75XHzFyZFkl1nUPsYbP1xeHPPFuEg/V15KiU1phPX X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:6b8c:b0:168:fee5:884 with SMTP id p12-20020a1709026b8c00b00168fee50884mr50598100plk.105.1657152370854; Wed, 06 Jul 2022 17:06:10 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:22 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-5-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 04/18] mm/khugepaged: dedup and simplify hugepage alloc and charging From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152371; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MdbzD7hrQbL1sx8u1+rqVGInIjHqkOI6kny7BDsg8mc=; b=NsrfNGdcpXIkln2iMQGhmxnLIrWOz9cUE21KahpodPCk8lLk52qC+L/UgRLVLF07O/mN0c dLKS3KfcIbpeAu5Vd9U97g1GUjChpWuC4sKtE/wSCRNXJZ9MXYkhzC58y274tjJY+MjuWT 4/NnOLjhR7AWnFhaOsbm0OtjvM5gH5Q= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152371; a=rsa-sha256; cv=none; b=nepdlTW/xjQwkQ6fTuIt+c6FMlziGtbIDRcl3suGDqlQhbsj79JbLcjfw8I71UNZeUHW5u pfQPvw78DREgh7QXDE2Z9VMAEmr+bElFcD1ESRKg5gOQEBUoRJi0Y2yI/QtACp83sUao83 yKpSS4CwygcFhvfqQw3xbRM4lhTx20M= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=q06nKCeA; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of 3ciPGYgcKCO0odZTTUTVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3ciPGYgcKCO0odZTTUTVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--zokeefe.bounces.google.com X-Rspam-User: Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=q06nKCeA; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of 3ciPGYgcKCO0odZTTUTVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3ciPGYgcKCO0odZTTUTVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--zokeefe.bounces.google.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: CEE994001B X-Stat-Signature: affr1qdmj1z6qr17szm954opsowt5imf X-HE-Tag: 1657152371-148988 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The following code is duplicated in collapse_huge_page() and collapse_file(): gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; new_page = khugepaged_alloc_page(hpage, gfp, node); if (!new_page) { result = SCAN_ALLOC_HUGE_PAGE_FAIL; goto out; } if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { result = SCAN_CGROUP_CHARGE_FAIL; goto out; } count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); Also, "node" is passed as an argument to both collapse_huge_page() and collapse_file() and obtained the same way, via khugepaged_find_target_node(). Move all this into a new helper, alloc_charge_hpage(), and remove the duplicate code from collapse_huge_page() and collapse_file(). Also, simplify khugepaged_alloc_page() by returning a bool indicating allocation success instead of a copy of the allocated struct page *. Suggested-by: Peter Xu Acked-by: David Rientjes Reviewed-by: Yang Shi Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 78 ++++++++++++++++++++++--------------------------- 1 file changed, 35 insertions(+), 43 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index f1ef02d9fe07..8068adf24620 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -813,19 +813,18 @@ static int khugepaged_find_target_node(struct collapse_control *cc) } #endif -static struct page * -khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) { *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); if (unlikely(!*hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); *hpage = ERR_PTR(-ENOMEM); - return NULL; + return false; } prep_transhuge_page(*hpage); count_vm_event(THP_COLLAPSE_ALLOC); - return *hpage; + return true; } /* @@ -921,10 +920,24 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, return true; } -static void collapse_huge_page(struct mm_struct *mm, - unsigned long address, - struct page **hpage, - int node, int referenced, int unmapped) +static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, + struct collapse_control *cc) +{ + /* Only allocate from the target node */ + gfp_t gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; + int node = khugepaged_find_target_node(cc); + + if (!khugepaged_alloc_page(hpage, gfp, node)) + return SCAN_ALLOC_HUGE_PAGE_FAIL; + if (unlikely(mem_cgroup_charge(page_folio(*hpage), mm, gfp))) + return SCAN_CGROUP_CHARGE_FAIL; + count_memcg_page_event(*hpage, THP_COLLAPSE_ALLOC); + return SCAN_SUCCEED; +} + +static void collapse_huge_page(struct mm_struct *mm, unsigned long address, + struct page **hpage, int referenced, + int unmapped, struct collapse_control *cc) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; @@ -935,13 +948,9 @@ static void collapse_huge_page(struct mm_struct *mm, int isolated = 0, result = 0; struct vm_area_struct *vma; struct mmu_notifier_range range; - gfp_t gfp; VM_BUG_ON(address & ~HPAGE_PMD_MASK); - /* Only allocate from the target node */ - gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; - /* * Before allocating the hugepage, release the mmap_lock read lock. * The allocation can take potentially a long time if it involves @@ -949,17 +958,12 @@ static void collapse_huge_page(struct mm_struct *mm, * that. We will recheck the vma after taking it again in write mode. */ mmap_read_unlock(mm); - new_page = khugepaged_alloc_page(hpage, gfp, node); - if (!new_page) { - result = SCAN_ALLOC_HUGE_PAGE_FAIL; - goto out_nolock; - } - if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { - result = SCAN_CGROUP_CHARGE_FAIL; + result = alloc_charge_hpage(hpage, mm, cc); + if (result != SCAN_SUCCEED) goto out_nolock; - } - count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); + + new_page = *hpage; mmap_read_lock(mm); result = hugepage_vma_revalidate(mm, address, &vma); @@ -1233,10 +1237,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, out_unmap: pte_unmap_unlock(pte, ptl); if (ret) { - node = khugepaged_find_target_node(cc); /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, hpage, node, - referenced, unmapped); + collapse_huge_page(mm, address, hpage, referenced, unmapped, + cc); } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, @@ -1504,7 +1507,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * @file: file that collapse on * @start: collapse start address * @hpage: new allocated huge page for collapse - * @node: appointed node the new huge page allocate from + * @cc: collapse context and scratchpad * * Basic scheme is simple, details are more complex: * - allocate and lock a new huge page; @@ -1521,12 +1524,11 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * + restore gaps in the page cache; * + unlock and free huge page; */ -static void collapse_file(struct mm_struct *mm, - struct file *file, pgoff_t start, - struct page **hpage, int node) +static void collapse_file(struct mm_struct *mm, struct file *file, + pgoff_t start, struct page **hpage, + struct collapse_control *cc) { struct address_space *mapping = file->f_mapping; - gfp_t gfp; struct page *new_page; pgoff_t index, end = start + HPAGE_PMD_NR; LIST_HEAD(pagelist); @@ -1538,20 +1540,11 @@ static void collapse_file(struct mm_struct *mm, VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); - /* Only allocate from the target node */ - gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; - - new_page = khugepaged_alloc_page(hpage, gfp, node); - if (!new_page) { - result = SCAN_ALLOC_HUGE_PAGE_FAIL; + result = alloc_charge_hpage(hpage, mm, cc); + if (result != SCAN_SUCCEED) goto out; - } - if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { - result = SCAN_CGROUP_CHARGE_FAIL; - goto out; - } - count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); + new_page = *hpage; /* * Ensure we have slots for all the pages in the range. This is @@ -1963,8 +1956,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - node = khugepaged_find_target_node(cc); - collapse_file(mm, file, start, hpage, node); + collapse_file(mm, file, start, hpage, cc); } } From patchwork Wed Jul 6 23:59:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908929 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17A48C43334 for ; Thu, 7 Jul 2022 00:06:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 469878E0005; Wed, 6 Jul 2022 20:06:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F2738E0001; Wed, 6 Jul 2022 20:06:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1AB298E0005; Wed, 6 Jul 2022 20:06:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 01AD28E0001 for ; Wed, 6 Jul 2022 20:06:13 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D473D343B9 for ; Thu, 7 Jul 2022 00:06:13 +0000 (UTC) X-FDA: 79658361426.06.126A974 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf16.hostedemail.com (Postfix) with ESMTP id 7375A180030 for ; Thu, 7 Jul 2022 00:06:13 +0000 (UTC) Received: by mail-pg1-f201.google.com with SMTP id h190-20020a636cc7000000b003fd5d5452cfso6557023pgc.8 for ; Wed, 06 Jul 2022 17:06:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=PMfgF9y/ASknK82HKWgp1JfN2fzbwCRjdkPz01UtRfM=; b=OgJPwJHZdBQQbVmKlJxIu/Ho0cPQN5fr1cc3530wDN4t44Xy56mHTQjOESrStClQTG 6f0ETk+LBHluzBfKytvMAk9GTRFRvGs7P54efTSNMGi3MZLTDsNXrpaYWj+sJoruLiNB 6p4Vd2IXBqjIphGxmfctez51U89TjARjyGGoeqKaSrZbY2qzt4sVeOvRWB5GxZVPW1Zw NoP0VC2W0YTk72a/paoXYrsZTVcoCPmgszMQmDZdqB9Ekdl9yP8Fq4IpwjAxQYFLTz9p diDhJqfgB8VwW832pDewKA+FVgADlTqHX7vSPBMOK/aMlUJKc8ladr40EUx4IjTv6ASp Y03Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=PMfgF9y/ASknK82HKWgp1JfN2fzbwCRjdkPz01UtRfM=; b=uRSgwHCZPVIsSV2cLAI2dxEhyoO46HpUqLo8yHhgq5kHMeTz4NPFnpaWrGmWslfSGU MPSCFnXOuon/QzBVGUU0X5TftvP13yhZ7DlogImdnNQk9E9QN/6gWxaoNy+pXGdXJOvq 2fSIYdGwNwX35gSrXEgEC+fiBugZ6PNdAgOjai1rcAU8HjI99tnnuCtwzloSXOmFzTOa TI+fmRIqNf08AEscILhjfLGkdTPIO+Et60JT3cLIw7vBpTVc9tj4+7vSZwd1MK7XLKp3 r/9laSz7F6oClPE3eGuiLy2IUyN9TNvLNoxqAsxL8BtbOVLkLraB8BKZvFdadcC3gdwr xzeQ== X-Gm-Message-State: AJIora98Gp4f0af2xc12O87j+ZPEl7ZYDt0k2zSgo4UTH5G9SFJeJWIx Q8Oe7JkKxo8nM23hGL8iQjN6XOule4g6 X-Google-Smtp-Source: AGRyM1vmmesNHAi/Q/r15mt55hrq1Do6PSqVBXaGr3/45+NYBLl01zRZ63PjKRIRlxVrCDkK+T5vmowGPxiB X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:a26:b0:528:9831:d935 with SMTP id p38-20020a056a000a2600b005289831d935mr10875162pfh.25.1657152372494; Wed, 06 Jul 2022 17:06:12 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:23 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-6-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 05/18] mm/khugepaged: pipe enum scan_result codes back to callers From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152373; a=rsa-sha256; cv=none; b=tbhCqxGFcUAlMV4+ws+G7BxA4mccTSdWSRQIq4drzc4FIFMwXqicbMQRbgXzmU7FElZ9tL 7YHoaAmXxdLuUx1fN4GVSwpfuNXBI4hUWXpK+caAbRZww6HmCiH1508tIkF83ofJRAx6hi ZnG/hdU7HiV+5POm5awgK4LkkDqhQdc= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=OgJPwJHZ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of 3dCPGYgcKCO8qfbVVWVXffXcV.TfdcZelo-ddbmRTb.fiX@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3dCPGYgcKCO8qfbVVWVXffXcV.TfdcZelo-ddbmRTb.fiX@flex--zokeefe.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152373; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PMfgF9y/ASknK82HKWgp1JfN2fzbwCRjdkPz01UtRfM=; b=P9D3sOhN1WaMtCFZgApHIqkHAWTCuxBmQeeyH0JjIyjis7YDszlIenLGTJmeHKcr9qtIlU jtiMoVbbOSGx9m0PxGmWlz66qiNUtKku7lA5VBdixxHiNbV9GxIE3qN+YgSMPvNYiWdqB3 xDviDrdIxaelAwM6pQFAyTxsL0M6gEE= X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 7375A180030 X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=OgJPwJHZ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of 3dCPGYgcKCO8qfbVVWVXffXcV.TfdcZelo-ddbmRTb.fiX@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3dCPGYgcKCO8qfbVVWVXffXcV.TfdcZelo-ddbmRTb.fiX@flex--zokeefe.bounces.google.com X-Stat-Signature: 551famj4qdr6tyc3hix7k1dmfhmiurcq X-HE-Tag: 1657152373-299860 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pipe enum scan_result codes back through return values of functions downstream of khugepaged_scan_file() and khugepaged_scan_pmd() to inform callers if the operation was successful, and if not, why. Since khugepaged_scan_pmd()'s return value already has a specific meaning (whether mmap_lock was unlocked or not), add a bool* argument to khugepaged_scan_pmd() to retrieve this information. Change khugepaged to take action based on the return values of khugepaged_scan_file() and khugepaged_scan_pmd() instead of acting deep within the collapsing functions themselves. hugepage_vma_revalidate() now returns SCAN_SUCCEED on success to be more consistent with enum scan_result propagation. Remove dependency on error pointers to communicate to khugepaged that allocation failed and it should sleep; instead just use the result of the scan (SCAN_ALLOC_HUGE_PAGE_FAIL if allocation fails). Signed-off-by: Zach O'Keefe Reviewed-by: Yang Shi --- mm/khugepaged.c | 233 ++++++++++++++++++++++++------------------------ 1 file changed, 117 insertions(+), 116 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 8068adf24620..147f5828f052 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -558,7 +558,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, { struct page *page = NULL; pte_t *_pte; - int none_or_zero = 0, shared = 0, result = 0, referenced = 0; + int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0; bool writable = false; for (_pte = pte; _pte < pte + HPAGE_PMD_NR; @@ -672,13 +672,13 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_SUCCEED; trace_mm_collapse_huge_page_isolate(page, none_or_zero, referenced, writable, result); - return 1; + return result; } out: release_pte_pages(pte, _pte, compound_pagelist); trace_mm_collapse_huge_page_isolate(page, none_or_zero, referenced, writable, result); - return 0; + return result; } static void __collapse_huge_page_copy(pte_t *pte, struct page *page, @@ -818,7 +818,6 @@ static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); if (unlikely(!*hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); - *hpage = ERR_PTR(-ENOMEM); return false; } @@ -830,8 +829,7 @@ static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) /* * If mmap_lock temporarily dropped, revalidate vma * before taking mmap_lock. - * Return 0 if succeeds, otherwise return none-zero - * value (scan code). + * Returns enum scan_result value. */ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, @@ -857,7 +855,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, */ if (!vma->anon_vma || !vma_is_anonymous(vma)) return SCAN_VMA_CHECK; - return 0; + return SCAN_SUCCEED; } /* @@ -868,10 +866,10 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, * Note that if false is returned, mmap_lock will be released. */ -static bool __collapse_huge_page_swapin(struct mm_struct *mm, - struct vm_area_struct *vma, - unsigned long haddr, pmd_t *pmd, - int referenced) +static int __collapse_huge_page_swapin(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long haddr, pmd_t *pmd, + int referenced) { int swapped_in = 0; vm_fault_t ret = 0; @@ -902,12 +900,13 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, */ if (ret & VM_FAULT_RETRY) { trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); - return false; + /* Likely, but not guaranteed, that page lock failed */ + return SCAN_PAGE_LOCK; } if (ret & VM_FAULT_ERROR) { mmap_read_unlock(mm); trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); - return false; + return SCAN_FAIL; } swapped_in++; } @@ -917,7 +916,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, lru_add_drain(); trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 1); - return true; + return SCAN_SUCCEED; } static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, @@ -935,17 +934,17 @@ static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, return SCAN_SUCCEED; } -static void collapse_huge_page(struct mm_struct *mm, unsigned long address, - struct page **hpage, int referenced, - int unmapped, struct collapse_control *cc) +static int collapse_huge_page(struct mm_struct *mm, unsigned long address, + int referenced, int unmapped, + struct collapse_control *cc) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; pte_t *pte; pgtable_t pgtable; - struct page *new_page; + struct page *hpage; spinlock_t *pmd_ptl, *pte_ptl; - int isolated = 0, result = 0; + int result = SCAN_FAIL; struct vm_area_struct *vma; struct mmu_notifier_range range; @@ -959,15 +958,13 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, */ mmap_read_unlock(mm); - result = alloc_charge_hpage(hpage, mm, cc); + result = alloc_charge_hpage(&hpage, mm, cc); if (result != SCAN_SUCCEED) goto out_nolock; - new_page = *hpage; - mmap_read_lock(mm); result = hugepage_vma_revalidate(mm, address, &vma); - if (result) { + if (result != SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; } @@ -979,14 +976,16 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, goto out_nolock; } - /* - * __collapse_huge_page_swapin will return with mmap_lock released - * when it fails. So we jump out_nolock directly in that case. - * Continuing to collapse causes inconsistency. - */ - if (unmapped && !__collapse_huge_page_swapin(mm, vma, address, - pmd, referenced)) { - goto out_nolock; + if (unmapped) { + /* + * __collapse_huge_page_swapin will return with mmap_lock + * released when it fails. So we jump out_nolock directly in + * that case. Continuing to collapse causes inconsistency. + */ + result = __collapse_huge_page_swapin(mm, vma, address, pmd, + referenced); + if (result != SCAN_SUCCEED) + goto out_nolock; } mmap_read_unlock(mm); @@ -997,7 +996,7 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, */ mmap_write_lock(mm); result = hugepage_vma_revalidate(mm, address, &vma); - if (result) + if (result != SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ if (mm_find_pmd(mm, address) != pmd) @@ -1024,11 +1023,11 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, mmu_notifier_invalidate_range_end(&range); spin_lock(pte_ptl); - isolated = __collapse_huge_page_isolate(vma, address, pte, - &compound_pagelist); + result = __collapse_huge_page_isolate(vma, address, pte, + &compound_pagelist); spin_unlock(pte_ptl); - if (unlikely(!isolated)) { + if (unlikely(result != SCAN_SUCCEED)) { pte_unmap(pte); spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); @@ -1040,7 +1039,6 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, pmd_populate(mm, pmd, pmd_pgtable(_pmd)); spin_unlock(pmd_ptl); anon_vma_unlock_write(vma->anon_vma); - result = SCAN_FAIL; goto out_up_write; } @@ -1050,8 +1048,8 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, */ anon_vma_unlock_write(vma->anon_vma); - __collapse_huge_page_copy(pte, new_page, vma, address, pte_ptl, - &compound_pagelist); + __collapse_huge_page_copy(pte, hpage, vma, address, pte_ptl, + &compound_pagelist); pte_unmap(pte); /* * spin_lock() below is not the equivalent of smp_wmb(), but @@ -1059,43 +1057,42 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, * avoid the copy_huge_page writes to become visible after * the set_pmd_at() write. */ - __SetPageUptodate(new_page); + __SetPageUptodate(hpage); pgtable = pmd_pgtable(_pmd); - _pmd = mk_huge_pmd(new_page, vma->vm_page_prot); + _pmd = mk_huge_pmd(hpage, vma->vm_page_prot); _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); - page_add_new_anon_rmap(new_page, vma, address); - lru_cache_add_inactive_or_unevictable(new_page, vma); + page_add_new_anon_rmap(hpage, vma, address); + lru_cache_add_inactive_or_unevictable(hpage, vma); pgtable_trans_huge_deposit(mm, pmd, pgtable); set_pmd_at(mm, address, pmd, _pmd); update_mmu_cache_pmd(vma, address, pmd); spin_unlock(pmd_ptl); - *hpage = NULL; + hpage = NULL; - khugepaged_pages_collapsed++; result = SCAN_SUCCEED; out_up_write: mmap_write_unlock(mm); out_nolock: - if (!IS_ERR_OR_NULL(*hpage)) { - mem_cgroup_uncharge(page_folio(*hpage)); - put_page(*hpage); + if (hpage) { + mem_cgroup_uncharge(page_folio(hpage)); + put_page(hpage); } - trace_mm_collapse_huge_page(mm, isolated, result); - return; + trace_mm_collapse_huge_page(mm, result == SCAN_SUCCEED, result); + return result; } static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long address, struct page **hpage, + unsigned long address, bool *mmap_locked, struct collapse_control *cc) { pmd_t *pmd; pte_t *pte, *_pte; - int ret = 0, result = 0, referenced = 0; + int result = SCAN_FAIL, referenced = 0; int none_or_zero = 0, shared = 0; struct page *page = NULL; unsigned long _address; @@ -1232,19 +1229,19 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; - ret = 1; } out_unmap: pte_unmap_unlock(pte, ptl); - if (ret) { + if (result == SCAN_SUCCEED) { + result = collapse_huge_page(mm, address, referenced, + unmapped, cc); /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, hpage, referenced, unmapped, - cc); + *mmap_locked = false; } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, none_or_zero, result, unmapped); - return ret; + return result; } static void collect_mm_slot(struct mm_slot *mm_slot) @@ -1506,7 +1503,6 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * @mm: process address space where collapse happens * @file: file that collapse on * @start: collapse start address - * @hpage: new allocated huge page for collapse * @cc: collapse context and scratchpad * * Basic scheme is simple, details are more complex: @@ -1524,12 +1520,11 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * + restore gaps in the page cache; * + unlock and free huge page; */ -static void collapse_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct page **hpage, - struct collapse_control *cc) +static int collapse_file(struct mm_struct *mm, struct file *file, + pgoff_t start, struct collapse_control *cc) { struct address_space *mapping = file->f_mapping; - struct page *new_page; + struct page *hpage; pgoff_t index, end = start + HPAGE_PMD_NR; LIST_HEAD(pagelist); XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER); @@ -1540,12 +1535,10 @@ static void collapse_file(struct mm_struct *mm, struct file *file, VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); - result = alloc_charge_hpage(hpage, mm, cc); + result = alloc_charge_hpage(&hpage, mm, cc); if (result != SCAN_SUCCEED) goto out; - new_page = *hpage; - /* * Ensure we have slots for all the pages in the range. This is * almost certainly a no-op because most of the pages must be present @@ -1562,14 +1555,14 @@ static void collapse_file(struct mm_struct *mm, struct file *file, } } while (1); - __SetPageLocked(new_page); + __SetPageLocked(hpage); if (is_shmem) - __SetPageSwapBacked(new_page); - new_page->index = start; - new_page->mapping = mapping; + __SetPageSwapBacked(hpage); + hpage->index = start; + hpage->mapping = mapping; /* - * At this point the new_page is locked and not up-to-date. + * At this point the hpage is locked and not up-to-date. * It's safe to insert it into the page cache, because nobody would * be able to map it or use it in another way until we unlock it. */ @@ -1597,7 +1590,7 @@ static void collapse_file(struct mm_struct *mm, struct file *file, result = SCAN_FAIL; goto xa_locked; } - xas_store(&xas, new_page); + xas_store(&xas, hpage); nr_none++; continue; } @@ -1739,19 +1732,19 @@ static void collapse_file(struct mm_struct *mm, struct file *file, list_add_tail(&page->lru, &pagelist); /* Finally, replace with the new page. */ - xas_store(&xas, new_page); + xas_store(&xas, hpage); continue; out_unlock: unlock_page(page); put_page(page); goto xa_unlocked; } - nr = thp_nr_pages(new_page); + nr = thp_nr_pages(hpage); if (is_shmem) - __mod_lruvec_page_state(new_page, NR_SHMEM_THPS, nr); + __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); else { - __mod_lruvec_page_state(new_page, NR_FILE_THPS, nr); + __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); filemap_nr_thps_inc(mapping); /* * Paired with smp_mb() in do_dentry_open() to ensure @@ -1762,21 +1755,21 @@ static void collapse_file(struct mm_struct *mm, struct file *file, smp_mb(); if (inode_is_open_for_write(mapping->host)) { result = SCAN_FAIL; - __mod_lruvec_page_state(new_page, NR_FILE_THPS, -nr); + __mod_lruvec_page_state(hpage, NR_FILE_THPS, -nr); filemap_nr_thps_dec(mapping); goto xa_locked; } } if (nr_none) { - __mod_lruvec_page_state(new_page, NR_FILE_PAGES, nr_none); + __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_none); /* nr_none is always 0 for non-shmem. */ - __mod_lruvec_page_state(new_page, NR_SHMEM, nr_none); + __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); } /* Join all the small entries into a single multi-index entry */ xas_set_order(&xas, start, HPAGE_PMD_ORDER); - xas_store(&xas, new_page); + xas_store(&xas, hpage); xa_locked: xas_unlock_irq(&xas); xa_unlocked: @@ -1798,11 +1791,11 @@ static void collapse_file(struct mm_struct *mm, struct file *file, index = start; list_for_each_entry_safe(page, tmp, &pagelist, lru) { while (index < page->index) { - clear_highpage(new_page + (index % HPAGE_PMD_NR)); + clear_highpage(hpage + (index % HPAGE_PMD_NR)); index++; } - copy_highpage(new_page + (page->index % HPAGE_PMD_NR), - page); + copy_highpage(hpage + (page->index % HPAGE_PMD_NR), + page); list_del(&page->lru); page->mapping = NULL; page_ref_unfreeze(page, 1); @@ -1813,23 +1806,22 @@ static void collapse_file(struct mm_struct *mm, struct file *file, index++; } while (index < end) { - clear_highpage(new_page + (index % HPAGE_PMD_NR)); + clear_highpage(hpage + (index % HPAGE_PMD_NR)); index++; } - SetPageUptodate(new_page); - page_ref_add(new_page, HPAGE_PMD_NR - 1); + SetPageUptodate(hpage); + page_ref_add(hpage, HPAGE_PMD_NR - 1); if (is_shmem) - set_page_dirty(new_page); - lru_cache_add(new_page); + set_page_dirty(hpage); + lru_cache_add(hpage); /* * Remove pte page tables, so we can re-fault the page as huge. */ retract_page_tables(mapping, start); - *hpage = NULL; - - khugepaged_pages_collapsed++; + unlock_page(hpage); + hpage = NULL; } else { struct page *page; @@ -1868,22 +1860,23 @@ static void collapse_file(struct mm_struct *mm, struct file *file, VM_BUG_ON(nr_none); xas_unlock_irq(&xas); - new_page->mapping = NULL; + hpage->mapping = NULL; } - unlock_page(new_page); + if (hpage) + unlock_page(hpage); out: VM_BUG_ON(!list_empty(&pagelist)); - if (!IS_ERR_OR_NULL(*hpage)) { - mem_cgroup_uncharge(page_folio(*hpage)); - put_page(*hpage); + if (hpage) { + mem_cgroup_uncharge(page_folio(hpage)); + put_page(hpage); } /* TODO: tracepoints */ + return result; } -static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct page **hpage, - struct collapse_control *cc) +static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, + pgoff_t start, struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -1956,16 +1949,16 @@ static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - collapse_file(mm, file, start, hpage, cc); + result = collapse_file(mm, file, start, cc); } } /* TODO: tracepoints */ + return result; } #else -static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct page **hpage, - struct collapse_control *cc) +static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, + pgoff_t start, struct collapse_control *cc) { BUILD_BUG(); } @@ -1975,8 +1968,7 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) } #endif -static unsigned int khugepaged_scan_mm_slot(unsigned int pages, - struct page **hpage, +static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, struct collapse_control *cc) __releases(&khugepaged_mm_lock) __acquires(&khugepaged_mm_lock) @@ -1990,6 +1982,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, VM_BUG_ON(!pages); lockdep_assert_held(&khugepaged_mm_lock); + *result = SCAN_FAIL; if (khugepaged_scan.mm_slot) mm_slot = khugepaged_scan.mm_slot; @@ -2039,7 +2032,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK); while (khugepaged_scan.address < hend) { - int ret; + bool mmap_locked = true; + cond_resched(); if (unlikely(khugepaged_test_exit(mm))) goto breakouterloop; @@ -2053,20 +2047,28 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, khugepaged_scan.address); mmap_read_unlock(mm); - ret = 1; - khugepaged_scan_file(mm, file, pgoff, hpage, - cc); + *result = khugepaged_scan_file(mm, file, pgoff, + cc); + mmap_locked = false; fput(file); } else { - ret = khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, - hpage, cc); + *result = khugepaged_scan_pmd(mm, vma, + khugepaged_scan.address, + &mmap_locked, cc); } + if (*result == SCAN_SUCCEED) + ++khugepaged_pages_collapsed; /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; progress += HPAGE_PMD_NR; - if (ret) - /* we released mmap_lock so break loop */ + if (!mmap_locked) + /* + * We released mmap_lock so break loop. Note + * that we drop mmap_lock before all hugepage + * allocations, so if allocation fails, we are + * guaranteed to break here and report the + * correct result back to caller. + */ goto breakouterloop_mmap_lock; if (progress >= pages) goto breakouterloop; @@ -2118,10 +2120,10 @@ static int khugepaged_wait_event(void) static void khugepaged_do_scan(struct collapse_control *cc) { - struct page *hpage = NULL; unsigned int progress = 0, pass_through_head = 0; unsigned int pages = READ_ONCE(khugepaged_pages_to_scan); bool wait = true; + int result = SCAN_SUCCEED; lru_add_drain_all(); @@ -2137,7 +2139,7 @@ static void khugepaged_do_scan(struct collapse_control *cc) if (khugepaged_has_work() && pass_through_head < 2) progress += khugepaged_scan_mm_slot(pages - progress, - &hpage, cc); + &result, cc); else progress = pages; spin_unlock(&khugepaged_mm_lock); @@ -2145,7 +2147,7 @@ static void khugepaged_do_scan(struct collapse_control *cc) if (progress >= pages) break; - if (IS_ERR(hpage)) { + if (result == SCAN_ALLOC_HUGE_PAGE_FAIL) { /* * If fail to allocate the first time, try to sleep for * a while. When hit again, cancel the scan. @@ -2153,7 +2155,6 @@ static void khugepaged_do_scan(struct collapse_control *cc) if (!wait) break; wait = false; - hpage = NULL; khugepaged_alloc_sleep(); } } From patchwork Wed Jul 6 23:59:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908930 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9CC4CCA47C for ; Thu, 7 Jul 2022 00:06:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 15BD28E0006; Wed, 6 Jul 2022 20:06:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E5908E0001; Wed, 6 Jul 2022 20:06:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E7D1D8E0006; Wed, 6 Jul 2022 20:06:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C9B748E0001 for ; Wed, 6 Jul 2022 20:06:15 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A169660BF1 for ; Thu, 7 Jul 2022 00:06:15 +0000 (UTC) X-FDA: 79658361510.18.3602287 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf31.hostedemail.com (Postfix) with ESMTP id 3A3E620022 for ; Thu, 7 Jul 2022 00:06:15 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id 5-20020a620605000000b00527ca01f8a3so5231770pfg.19 for ; Wed, 06 Jul 2022 17:06:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=nGPRm3jBy7CPW6XxWyMVvUpK6GqKmHYZwqbzgU+8wCA=; b=F/panot2aB14gJ3HA9smB0TSYj5JHSLrtVaEpUch8I8u0mxeJMmgi4G6vjg1ACzX11 OIlkiMMDQ096dNlLhKRVd1FiASi+M47yVQ08chL1Er1RoQefCmYcDUN6huNMJ9aEpv9g ursgx68+m2kgx+sO7sEq4oVft3lONBuh64l73B6pD1mso9r1UBcjHr9BZhSeDzMcqbKv 1WUpKN1sn7Riv2w3WiKZzDE2/M+gtA2wOfi4I2uEhUrH6j0jGSJR9Q7PQWspOTfL6VVZ jETebX1TMXmr9GpJmqbo9hYxteiLuCQUmxB9MaWTw+FBnnS8/jMN0oWpXjCg196qGwSN +ZCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=nGPRm3jBy7CPW6XxWyMVvUpK6GqKmHYZwqbzgU+8wCA=; b=pNJTXrRP6QvDIQG0yTIUTkMxYTjFfVK9kD7Q0WcFwXb0f03fPcKvMS23x7TeoAFGY4 GbdWumupYf8iYL+B44dAL4lmU5WkaO9MytKnCj8+deEpHsBfbjI8S0/iAddNd+a4Xn/P hMBsInV5LszSu+N/jJkJwrgvlOzT4ouJ0mCIky7/i11psd7h+YavfHN1GgEnstaGo9xZ y8gvpkulI7eWdvHCiO4vNq26DAT8gjU37z7mKzPrmtEbBaQP2H5jA1G6p5PcWPhMgdEj 5+vap0Vwwwwvj+ahE5ngJZ0o7yhwqOqjN9pH36EgfSWEpIm3qk7La0XG4r9fWM7qcZY0 cMzQ== X-Gm-Message-State: AJIora/ZbxJZoeGF29PKPxvxrgd1ZAKL09cPNqWVCa93Z1SB6OJPo+Jg o6J+Ijx73pUQqQvtGDWIjqve0M2VvFph X-Google-Smtp-Source: AGRyM1vE5PBt5bqjUGGedkHJSxjiJqad8/x9HegirPfXCfDS7TP8QRUU+OgfTyjVheV3VqtZLgSKNeMB9a7y X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:1d88:b0:1ec:d5f6:f141 with SMTP id pf8-20020a17090b1d8800b001ecd5f6f141mr1568586pjb.119.1657152374352; Wed, 06 Jul 2022 17:06:14 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:24 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-7-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 06/18] mm/khugepaged: add flag to predicate khugepaged-only behavior From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152375; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nGPRm3jBy7CPW6XxWyMVvUpK6GqKmHYZwqbzgU+8wCA=; b=FerAmSQJr8CCTZrQXsRGrwwV9CxF7bKA50qSg3QBY7C5Gs/g+MY6j1ne6Nc0DfGDD1ViyC 9P2YD00oLPM5tJLhgQ5OTGLWQvaNNu9YlWjHvumF+L/LXbFSjKmpGARFmVz/b+m3Mywd4k c4Tv2v1weUN6xKmEEkTmWG5D7KArEjc= ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="F/panot2"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf31.hostedemail.com: domain of 3diPGYgcKCPEshdXXYXZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3diPGYgcKCPEshdXXYXZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--zokeefe.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152375; a=rsa-sha256; cv=none; b=RkeqtjAcrHAzuBq5Gf1OClMjhwCRX68LHfHqGTByu94HGUM2jDh687vo0JHsmD4YYn5mWB pJMm4doCY6PJqB5esAFpQolCaWwbOLFDBYEBAGwmMAxdhOLqJL6xGRBA+QdarDDsSo/EnZ Bx2YKDTPTj3fxQkkqclQ3kZofGe/BsE= X-Rspam-User: X-Rspamd-Server: rspam07 Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="F/panot2"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf31.hostedemail.com: domain of 3diPGYgcKCPEshdXXYXZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3diPGYgcKCPEshdXXYXZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--zokeefe.bounces.google.com X-Stat-Signature: nwkfmdjkhzpw8ms56szd1xeu5nr4kyrz X-Rspamd-Queue-Id: 3A3E620022 X-HE-Tag: 1657152375-571658 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add .is_khugepaged flag to struct collapse_control so khugepaged-specific behavior can be elided by MADV_COLLAPSE context. Start by protecting khugepaged-specific heuristics by this flag. In MADV_COLLAPSE, the user presumably has reason to believe the collapse will be beneficial and khugepaged heuristics shouldn't prevent the user from doing so: 1) sysfs-controlled knobs khugepaged_max_ptes_[none|swap|shared] 2) requirement that some pages in region being collapsed be young or referenced Signed-off-by: Zach O'Keefe Reviewed-by: Yang Shi --- v6 -> v7: There is no functional change here from v6, just a renaming of flags to explicitly be predicated on khugepaged. --- mm/khugepaged.c | 62 ++++++++++++++++++++++++++++++++++--------------- 1 file changed, 43 insertions(+), 19 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 147f5828f052..d89056d8cbad 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -73,6 +73,8 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait); * default collapse hugepages if there is at least one pte mapped like * it would have happened if the vma was large enough during page * fault. + * + * Note that these are only respected if collapse was initiated by khugepaged. */ static unsigned int khugepaged_max_ptes_none __read_mostly; static unsigned int khugepaged_max_ptes_swap __read_mostly; @@ -86,6 +88,8 @@ static struct kmem_cache *mm_slot_cache __read_mostly; #define MAX_PTE_MAPPED_THP 8 struct collapse_control { + bool is_khugepaged; + /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; @@ -554,6 +558,7 @@ static bool is_refcount_suitable(struct page *page) static int __collapse_huge_page_isolate(struct vm_area_struct *vma, unsigned long address, pte_t *pte, + struct collapse_control *cc, struct list_head *compound_pagelist) { struct page *page = NULL; @@ -567,7 +572,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (pte_none(pteval) || (pte_present(pteval) && is_zero_pfn(pte_pfn(pteval)))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !cc->is_khugepaged)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -587,8 +593,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, VM_BUG_ON_PAGE(!PageAnon(page), page); - if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + if (cc->is_khugepaged && page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -654,10 +660,14 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (PageCompound(page)) list_add_tail(&page->lru, compound_pagelist); next: - /* There should be enough young pte to collapse the page */ - if (pte_young(pteval) || - page_is_young(page) || PageReferenced(page) || - mmu_notifier_test_young(vma->vm_mm, address)) + /* + * If collapse was initiated by khugepaged, check that there is + * enough young pte to justify collapsing the page + */ + if (cc->is_khugepaged && + (pte_young(pteval) || page_is_young(page) || + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + address))) referenced++; if (pte_write(pteval)) @@ -666,7 +676,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (unlikely(!writable)) { result = SCAN_PAGE_RO; - } else if (unlikely(!referenced)) { + } else if (unlikely(cc->is_khugepaged && !referenced)) { result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; @@ -745,6 +755,7 @@ static void khugepaged_alloc_sleep(void) struct collapse_control khugepaged_collapse_control = { + .is_khugepaged = true, .last_target_node = NUMA_NO_NODE, }; @@ -1023,7 +1034,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, mmu_notifier_invalidate_range_end(&range); spin_lock(pte_ptl); - result = __collapse_huge_page_isolate(vma, address, pte, + result = __collapse_huge_page_isolate(vma, address, pte, cc, &compound_pagelist); spin_unlock(pte_ptl); @@ -1114,7 +1125,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, _pte++, _address += PAGE_SIZE) { pte_t pteval = *_pte; if (is_swap_pte(pteval)) { - if (++unmapped <= khugepaged_max_ptes_swap) { + if (++unmapped <= khugepaged_max_ptes_swap || + !cc->is_khugepaged) { /* * Always be strict with uffd-wp * enabled swap entries. Please see @@ -1133,7 +1145,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, } if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !cc->is_khugepaged)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -1163,8 +1176,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, goto out_unmap; } - if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + if (cc->is_khugepaged && + page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out_unmap; @@ -1218,14 +1232,22 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, result = SCAN_PAGE_COUNT; goto out_unmap; } - if (pte_young(pteval) || - page_is_young(page) || PageReferenced(page) || - mmu_notifier_test_young(vma->vm_mm, address)) + + /* + * If collapse was initiated by khugepaged, check that there is + * enough young pte to justify collapsing the page + */ + if (cc->is_khugepaged && + (pte_young(pteval) || page_is_young(page) || + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + address))) referenced++; } if (!writable) { result = SCAN_PAGE_RO; - } else if (!referenced || (unmapped && referenced < HPAGE_PMD_NR/2)) { + } else if (cc->is_khugepaged && + (!referenced || + (unmapped && referenced < HPAGE_PMD_NR / 2))) { result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; @@ -1894,7 +1916,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, continue; if (xa_is_value(page)) { - if (++swap > khugepaged_max_ptes_swap) { + if (cc->is_khugepaged && + ++swap > khugepaged_max_ptes_swap) { result = SCAN_EXCEED_SWAP_PTE; count_vm_event(THP_SCAN_EXCEED_SWAP_PTE); break; @@ -1945,7 +1968,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, rcu_read_unlock(); if (result == SCAN_SUCCEED) { - if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { + if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none && + cc->is_khugepaged) { result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { From patchwork Wed Jul 6 23:59:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEDADC43334 for ; Thu, 7 Jul 2022 00:06:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4430D8E0007; Wed, 6 Jul 2022 20:06:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F3BC8E0001; Wed, 6 Jul 2022 20:06:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26F4D8E0007; Wed, 6 Jul 2022 20:06:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 005188E0001 for ; Wed, 6 Jul 2022 20:06:17 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C78A135FD3 for ; Thu, 7 Jul 2022 00:06:17 +0000 (UTC) X-FDA: 79658361594.28.8BEC744 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf10.hostedemail.com (Postfix) with ESMTP id 76243C0011 for ; Thu, 7 Jul 2022 00:06:17 +0000 (UTC) Received: by mail-pf1-f201.google.com with SMTP id c67-20020a621c46000000b005251cf9feb0so5272930pfc.20 for ; Wed, 06 Jul 2022 17:06:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=S6e/SGzCFllqQ6CKb9FNPojEMXHx3p8LL6b0nxTxDoo=; b=pG6vyEjbUPi+Nfx24Bz07g30T4NuePzWYuX8lBqzoMa70AnEfLRnNqy0gVToYpv1S+ wzkEUKqqZDZqyJti7SGlOhCXcD+oIdSuobXe9NH5wWDz36vfkiBi5iBSkhdyjTXgMztr yI1rYOk+gl7FZwXcUtz0Gc2+Eq1BaWoQT1zufQoRfb5SPMMBF+j0Pees3/wKMErjRKv/ fbFGg4k5j3mHTG2V3xYqtH1bL7HMczslYNSrKz8+gD1rgd63Fm7XwJ1/pyYnCHdmaCKD efuL1k8xEEtoATEK9QQPkOTd9KHNRasdg3lz1G9OveZhPs7/WxaJh0ZzBF2EKpmIxT10 IKJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=S6e/SGzCFllqQ6CKb9FNPojEMXHx3p8LL6b0nxTxDoo=; b=IuhoMcdvdO0ryxL/DTRL3J3PoDiiKvpjo96MpOwtmL+9L7UqHT3SLiQoER0TjsRT1K m61aHg7Wq7OGr4hPccX7UXeC0CWL18DivZfDKDXozu78Jr+ovxA4bRhmliDY8zorLdms 0Nc+l2NM5MX/ZtFQkv93+CmZDAXXB4vSACvgJLHElmb2jtc6HckJjTKQ9EZt23UYqWIk Jc+kNSs1r2gkQ3VjawL9iD+xrzPF8VdAUmRQXryaPPWwHcIyX2/jxGGl6p2+KU7YK7U+ JnzWyGyKJBIy2HGlxwkuGCfrkhey6cx/mc4ranhLEVf8oYsWg7XTIH85UNvqT/7JUiVi gtsA== X-Gm-Message-State: AJIora9yE/lWR3OeZfgL15U7TrR50Ga4PaDS/cc3SnHJzO1NiwjqhGvA Mirphm1855Yks0bJrK/PLtLU1swR9gZX X-Google-Smtp-Source: AGRyM1sTwXJs9LHUuj3kOER3jKLCLHhh4s8P3JfyZQNymztvogBGBpEIiHTOAQZuOZlGS5xYeyh5iM4QVrju X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:2395:b0:525:8980:5dc7 with SMTP id f21-20020a056a00239500b0052589805dc7mr50905990pfc.8.1657152376480; Wed, 06 Jul 2022 17:06:16 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:25 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-8-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 07/18] mm/thp: add flag to enforce sysfs THP in hugepage_vma_check() From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152377; a=rsa-sha256; cv=none; b=jzPhoBz51dASYMxzetdsGY8rbWr6E9QhyRNOVmIoPEszY3OFbBwrtaRPjuwtAxt08y9fJy GSgsPGLSWBSKcI1ruQqO24lEDbkM4DGVKxiNbq1bXp8t7sc0WncxNQBhU/+IPPqbbWzghq JnwWr/SwEC9BEY4nbL2JMyRoEPNz5/s= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=pG6vyEjb; spf=pass (imf10.hostedemail.com: domain of 3eCPGYgcKCPMujfZZaZbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3eCPGYgcKCPMujfZZaZbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152377; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S6e/SGzCFllqQ6CKb9FNPojEMXHx3p8LL6b0nxTxDoo=; b=nphdylIsdUOFiIf0ZLaIGULUM/57EMeXAEkW1azGOFJN/vxdVR1TwvgbmiP7fhvdlQibnE /20xKPsKZJb8mISw8iS9gkHF3uu0BtRbdNiq1vmo4a85uk4H8EfTqtWLGJyA43FP34P3mZ gP0Gdrce4InDCS+OJakBSblIdBV+Szs= Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=pG6vyEjb; spf=pass (imf10.hostedemail.com: domain of 3eCPGYgcKCPMujfZZaZbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3eCPGYgcKCPMujfZZaZbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: mfz7iuggt5tpxh83mmn6epair1o73iop X-Rspamd-Queue-Id: 76243C0011 X-HE-Tag: 1657152377-778848 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: MADV_COLLAPSE is not coupled to the kernel-oriented sysfs THP settings[1]. hugepage_vma_check() is the authority on determining if a VMA is eligible for THP allocation/collapse, and currently enforces the sysfs THP settings. Add a flag to disable these checks. For now, only apply this arg to anon and file, which use /sys/kernel/transparent_hugepage/enabled. We can expand this to shmem, which uses /sys/kernel/transparent_hugepage/shmem_enabled, later. Use this flag in collapse_pte_mapped_thp() where previously the VMA flags passed to hugepage_vma_check() were OR'd with VM_HUGEPAGE to elide the VM_HUGEPAGE check in "madvise" THP mode. Prior to "mm: khugepaged: check THP flag in hugepage_vma_check()", this check also didn't check "never" THP mode. As such, this restores the previous behavior of collapse_pte_mapped_thp() where sysfs THP settings are ignored. See comment in code for justification why this is OK. [1] https://lore.kernel.org/linux-mm/CAAa6QmQxay1_=Pmt8oCX2-Va18t44FV-Vs-WsQt_6+qBks4nZA@mail.gmail.com/ Signed-off-by: Zach O'Keefe Reviewed-by: Yang Shi --- fs/proc/task_mmu.c | 2 +- include/linux/huge_mm.h | 9 ++++----- mm/huge_memory.c | 14 ++++++-------- mm/khugepaged.c | 25 ++++++++++++++----------- mm/memory.c | 4 ++-- 5 files changed, 27 insertions(+), 27 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 34d292cec79a..f8cd58846a28 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -866,7 +866,7 @@ static int show_smap(struct seq_file *m, void *v) __show_smap(m, &mss, false); seq_printf(m, "THPeligible: %d\n", - hugepage_vma_check(vma, vma->vm_flags, true, false)); + hugepage_vma_check(vma, vma->vm_flags, true, false, true)); if (arch_pkeys_enabled()) seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 37f2f11a6d7e..00312fc251c1 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -168,9 +168,8 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); } -bool hugepage_vma_check(struct vm_area_struct *vma, - unsigned long vm_flags, - bool smaps, bool in_pf); +bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, + bool smaps, bool in_pf, bool enforce_sysfs); #define transparent_hugepage_use_zero_page() \ (transparent_hugepage_flags & \ @@ -321,8 +320,8 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, } static inline bool hugepage_vma_check(struct vm_area_struct *vma, - unsigned long vm_flags, - bool smaps, bool in_pf) + unsigned long vm_flags, bool smaps, + bool in_pf, bool enforce_sysfs) { return false; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index da300ce9dedb..4fbe43dc1568 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -69,9 +69,8 @@ static atomic_t huge_zero_refcount; struct page *huge_zero_page __read_mostly; unsigned long huge_zero_pfn __read_mostly = ~0UL; -bool hugepage_vma_check(struct vm_area_struct *vma, - unsigned long vm_flags, - bool smaps, bool in_pf) +bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, + bool smaps, bool in_pf, bool enforce_sysfs) { if (!vma->vm_mm) /* vdso */ return false; @@ -120,11 +119,10 @@ bool hugepage_vma_check(struct vm_area_struct *vma, if (!in_pf && shmem_file(vma->vm_file)) return shmem_huge_enabled(vma); - if (!hugepage_flags_enabled()) - return false; - - /* THP settings require madvise. */ - if (!(vm_flags & VM_HUGEPAGE) && !hugepage_flags_always()) + /* Enforce sysfs THP requirements as necessary */ + if (enforce_sysfs && + (!hugepage_flags_enabled() || (!(vm_flags & VM_HUGEPAGE) && + !hugepage_flags_always()))) return false; /* Only regular file is valid */ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index d89056d8cbad..b0e20db3f805 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -478,7 +478,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, { if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) && hugepage_flags_enabled()) { - if (hugepage_vma_check(vma, vm_flags, false, false)) + if (hugepage_vma_check(vma, vm_flags, false, false, true)) __khugepaged_enter(vma->vm_mm); } } @@ -844,7 +844,8 @@ static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) */ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, - struct vm_area_struct **vmap) + struct vm_area_struct **vmap, + struct collapse_control *cc) { struct vm_area_struct *vma; @@ -855,7 +856,8 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, if (!vma) return SCAN_VMA_NULL; - if (!hugepage_vma_check(vma, vma->vm_flags, false, false)) + if (!hugepage_vma_check(vma, vma->vm_flags, false, false, + cc->is_khugepaged)) return SCAN_VMA_CHECK; /* * Anon VMA expected, the address may be unmapped then @@ -974,7 +976,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, goto out_nolock; mmap_read_lock(mm); - result = hugepage_vma_revalidate(mm, address, &vma); + result = hugepage_vma_revalidate(mm, address, &vma, cc); if (result != SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; @@ -1006,7 +1008,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, * handled by the anon_vma lock + PG_lock. */ mmap_write_lock(mm); - result = hugepage_vma_revalidate(mm, address, &vma); + result = hugepage_vma_revalidate(mm, address, &vma, cc); if (result != SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ @@ -1350,12 +1352,13 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) return; /* - * This vm_flags may not have VM_HUGEPAGE if the page was not - * collapsed by this mm. But we can still collapse if the page is - * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check() - * will not fail the vma for missing VM_HUGEPAGE + * If we are here, we've succeeded in replacing all the native pages + * in the page cache with a single hugepage. If a mm were to fault-in + * this memory (mapped by a suitably aligned VMA), we'd get the hugepage + * and map it by a PMD, regardless of sysfs THP settings. As such, let's + * analogously elide sysfs THP settings here. */ - if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false, false)) + if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false)) return; /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ @@ -2042,7 +2045,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, progress++; break; } - if (!hugepage_vma_check(vma, vma->vm_flags, false, false)) { + if (!hugepage_vma_check(vma, vma->vm_flags, false, false, true)) { skip: progress++; continue; diff --git a/mm/memory.c b/mm/memory.c index 8917bea2f0bc..96cd776e84f1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5001,7 +5001,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, return VM_FAULT_OOM; retry_pud: if (pud_none(*vmf.pud) && - hugepage_vma_check(vma, vm_flags, false, true)) { + hugepage_vma_check(vma, vm_flags, false, true, true)) { ret = create_huge_pud(&vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; @@ -5035,7 +5035,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, goto retry_pud; if (pmd_none(*vmf.pmd) && - hugepage_vma_check(vma, vm_flags, false, true)) { + hugepage_vma_check(vma, vm_flags, false, true, true)) { ret = create_huge_pmd(&vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; From patchwork Wed Jul 6 23:59:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908932 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BE81C43334 for ; Thu, 7 Jul 2022 00:06:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 294E48E0008; Wed, 6 Jul 2022 20:06:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F83B8E0001; Wed, 6 Jul 2022 20:06:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3DDB8E0008; Wed, 6 Jul 2022 20:06:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D82068E0001 for ; Wed, 6 Jul 2022 20:06:19 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B21E1357D3 for ; Thu, 7 Jul 2022 00:06:19 +0000 (UTC) X-FDA: 79658361678.21.6DDC70F Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf17.hostedemail.com (Postfix) with ESMTP id 4662F4001E for ; Thu, 7 Jul 2022 00:06:19 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id h5-20020a62b405000000b00528c76085e4so368114pfn.15 for ; Wed, 06 Jul 2022 17:06:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=azaBRPKMBQNqTel70OP7y5jVN502POn9ChWb2FkVrDQ=; b=ieavFtb/cng3OFAq8TepAn41fg6ZX5kQ5rpRIES1arug47WUAXEycQ0wwcoAneisoN MMwhLOocSmb+Tzxt+zIzppbdYbQS5mcMQnEhunNJVY78gH+it07ffhPNQh0/lOouUPfJ aIgilgwYDixOxQVsOtLk79VrL0yazVP9aEcEl+kZPzQd+xQpTpEME7MiSefO2dYHBR3q y2sKC7QC5DF+bPCh2RHytfzcqCrRbXmsli42FlOiaoDTk+2+d6uqOUMyTnkhG+7FjsQu luo0H0AzLux49x7s8kAAtH5EZoBzBcZIA6Fp97/AnKU12eFB2NE85qUS/bbLuIS9NPT+ RJHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=azaBRPKMBQNqTel70OP7y5jVN502POn9ChWb2FkVrDQ=; b=DPKmmf+OBGqB+8kHO6AjWs/ppB/4qaqhIY37VFiM+OuFVFTH+cZa6g15qgiCtO+1CP 9DcyH41X2SzuHjBTTCCTyG7akJ8ZKZVD89FfzRTwtmpeLAge3lVGwbPM4Eh6MFJP8dqm TfI0OdeOh7daAvbcZmHZAg+yag6umaFN/+HNtzIWk1HLF3HnrVmryc9T4FQrbTge4DIv +g3fRPFS/W3J6wEdrWpdTtMSYhSScYazEMGr40ZfRQtKBBvX4SKggdPVjFrdnx/1MHaD XFwxvYWssHLEBKJsQULKiC4B4KmNjApcv0OY0twFKk0d+igcqUqKPOXNbMYx/TffpbHO bAXg== X-Gm-Message-State: AJIora+rTMZ0HcvpWRdNVVhZQ65xBQXQQyM1G75DOZcn19QOoptwcWyv 2XsCLSlVdT8kSHOhHe7xTCALgV5NfhiE X-Google-Smtp-Source: AGRyM1vWR8SR/F+5ayVsB5WyGiF4GYz6sVcUJaM/wuQ0jIhbtASVv6LLo+HqEpOQv87VGvxKQtSjd6DJYyPz X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:903:230d:b0:16a:73ce:9068 with SMTP id d13-20020a170903230d00b0016a73ce9068mr50533610plh.57.1657152378396; Wed, 06 Jul 2022 17:06:18 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:26 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-9-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 08/18] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds hugepage From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152379; a=rsa-sha256; cv=none; b=ogeiOQMNjkgrD+CGvNSQdlNO0M3r07CkJszJ8EAwjqtOGQUZuOjcTvSecqbFZM2hRs7me6 N1pYuSnanlLckPt206Jhi2jRgR5MfgBla8cK2YsRWKB1JDNBvg3KAEWBxlrtsTx8f1Fa5p CUcqG3vVXUhXCDNwAYmxKDoJw3yZFLI= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="ieavFtb/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 3eiPGYgcKCPUwlhbbcbdlldib.Zljifkru-jjhsXZh.lod@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3eiPGYgcKCPUwlhbbcbdlldib.Zljifkru-jjhsXZh.lod@flex--zokeefe.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152379; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=azaBRPKMBQNqTel70OP7y5jVN502POn9ChWb2FkVrDQ=; b=BCBDIbjSM3SdcAE5l6iIBT5JHqaxTZQJ9E4rMKgXRjp78EO05ppWGMcaFkMn9yPbfU2cYD msmMREtnirIopZR6x8BHk4e2emhhFmZdwzRUP5GTj2y13XMPhXIocvp0GpB0jmOPfwAb/f Z+IGlJqdHXAOiGGJ+9JhU0sEANEtyGw= X-Rspamd-Server: rspam04 X-Rspam-User: Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="ieavFtb/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 3eiPGYgcKCPUwlhbbcbdlldib.Zljifkru-jjhsXZh.lod@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3eiPGYgcKCPUwlhbbcbdlldib.Zljifkru-jjhsXZh.lod@flex--zokeefe.bounces.google.com X-Stat-Signature: mmssbgqbfxzjzefpghrkh6g4mrrnzrte X-Rspamd-Queue-Id: 4662F4001E X-HE-Tag: 1657152379-480671 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When scanning an anon pmd to see if it's eligible for collapse, return SCAN_PMD_MAPPED if the pmd already maps a hugepage. Note that SCAN_PMD_MAPPED is different from SCAN_PAGE_COMPOUND used in the file-collapse path, since the latter might identify pte-mapped compound pages. This is required by MADV_COLLAPSE which necessarily needs to know what hugepage-aligned/sized regions are already pmd-mapped. In order to determine if a pmd already maps a hugepage, refactor mm_find_pmd(): Return mm_find_pmd() to it's pre-commit f72e7dcdd252 ("mm: let mm_find_pmd fix buggy race with THP fault") behavior. ksm was the only caller that explicitly wanted a pte-mapping pmd, so open code the pte-mapping logic there (pmd_present() and pmd_trans_huge() checks). Undo revert change in commit f72e7dcdd252 ("mm: let mm_find_pmd fix buggy race with THP fault") that open-coded split_huge_pmd_address() pmd lookup and use mm_find_pmd() instead. Signed-off-by: Zach O'Keefe Reviewed-by: Yang Shi --- include/trace/events/huge_memory.h | 1 + mm/huge_memory.c | 18 +-------- mm/internal.h | 2 +- mm/khugepaged.c | 60 ++++++++++++++++++++++++------ mm/ksm.c | 10 +++++ mm/rmap.c | 15 +++----- 6 files changed, 67 insertions(+), 39 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index d651f3437367..55392bf30a03 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -11,6 +11,7 @@ EM( SCAN_FAIL, "failed") \ EM( SCAN_SUCCEED, "succeeded") \ EM( SCAN_PMD_NULL, "pmd_null") \ + EM( SCAN_PMD_MAPPED, "page_pmd_mapped") \ EM( SCAN_EXCEED_NONE_PTE, "exceed_none_pte") \ EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 4fbe43dc1568..fb76db6c703e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2363,25 +2363,11 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, bool freeze, struct folio *folio) { - pgd_t *pgd; - p4d_t *p4d; - pud_t *pud; - pmd_t *pmd; + pmd_t *pmd = mm_find_pmd(vma->vm_mm, address); - pgd = pgd_offset(vma->vm_mm, address); - if (!pgd_present(*pgd)) + if (!pmd) return; - p4d = p4d_offset(pgd, address); - if (!p4d_present(*p4d)) - return; - - pud = pud_offset(p4d, address); - if (!pud_present(*pud)) - return; - - pmd = pmd_offset(pud, address); - __split_huge_pmd(vma, pmd, address, freeze, folio); } diff --git a/mm/internal.h b/mm/internal.h index 6e14749ad1e5..ef8c23fb678f 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -188,7 +188,7 @@ extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason /* * in mm/rmap.c: */ -extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); +pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); /* * in mm/page_alloc.c diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b0e20db3f805..c7a09cc9a0e8 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -28,6 +28,7 @@ enum scan_result { SCAN_FAIL, SCAN_SUCCEED, SCAN_PMD_NULL, + SCAN_PMD_MAPPED, SCAN_EXCEED_NONE_PTE, SCAN_EXCEED_SWAP_PTE, SCAN_EXCEED_SHARED_PTE, @@ -871,6 +872,45 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, return SCAN_SUCCEED; } +static int find_pmd_or_thp_or_none(struct mm_struct *mm, + unsigned long address, + pmd_t **pmd) +{ + pmd_t pmde; + + *pmd = mm_find_pmd(mm, address); + if (!*pmd) + return SCAN_PMD_NULL; + + pmde = pmd_read_atomic(*pmd); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ + barrier(); +#endif + if (!pmd_present(pmde)) + return SCAN_PMD_NULL; + if (pmd_trans_huge(pmde)) + return SCAN_PMD_MAPPED; + if (pmd_bad(pmde)) + return SCAN_PMD_NULL; + return SCAN_SUCCEED; +} + +static int check_pmd_still_valid(struct mm_struct *mm, + unsigned long address, + pmd_t *pmd) +{ + pmd_t *new_pmd; + int result = find_pmd_or_thp_or_none(mm, address, &new_pmd); + + if (result != SCAN_SUCCEED) + return result; + if (new_pmd != pmd) + return SCAN_FAIL; + return SCAN_SUCCEED; +} + /* * Bring missing pages in from swap, to complete THP collapse. * Only done if khugepaged_scan_pmd believes it is worthwhile. @@ -982,9 +1022,8 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, goto out_nolock; } - pmd = mm_find_pmd(mm, address); - if (!pmd) { - result = SCAN_PMD_NULL; + result = find_pmd_or_thp_or_none(mm, address, &pmd); + if (result != SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; } @@ -1012,7 +1051,8 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, if (result != SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ - if (mm_find_pmd(mm, address) != pmd) + result = check_pmd_still_valid(mm, address, pmd); + if (result != SCAN_SUCCEED) goto out_up_write; anon_vma_lock_write(vma->anon_vma); @@ -1115,11 +1155,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, VM_BUG_ON(address & ~HPAGE_PMD_MASK); - pmd = mm_find_pmd(mm, address); - if (!pmd) { - result = SCAN_PMD_NULL; + result = find_pmd_or_thp_or_none(mm, address, &pmd); + if (result != SCAN_SUCCEED) goto out; - } memset(cc->node_load, 0, sizeof(cc->node_load)); pte = pte_offset_map_lock(mm, pmd, address, &ptl); @@ -1373,8 +1411,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) if (!PageHead(hpage)) goto drop_hpage; - pmd = mm_find_pmd(mm, haddr); - if (!pmd) + if (find_pmd_or_thp_or_none(mm, haddr, &pmd) != SCAN_SUCCEED) goto drop_hpage; start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); @@ -1492,8 +1529,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) if (vma->vm_end < addr + HPAGE_PMD_SIZE) continue; mm = vma->vm_mm; - pmd = mm_find_pmd(mm, addr); - if (!pmd) + if (find_pmd_or_thp_or_none(mm, addr, &pmd) != SCAN_SUCCEED) continue; /* * We need exclusive mmap_lock to retract page table. diff --git a/mm/ksm.c b/mm/ksm.c index 075123602bd0..3e0a0a42fa1f 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1136,6 +1136,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page, { struct mm_struct *mm = vma->vm_mm; pmd_t *pmd; + pmd_t pmde; pte_t *ptep; pte_t newpte; spinlock_t *ptl; @@ -1150,6 +1151,15 @@ static int replace_page(struct vm_area_struct *vma, struct page *page, pmd = mm_find_pmd(mm, addr); if (!pmd) goto out; + /* + * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at() + * without holding anon_vma lock for write. So when looking for a + * genuine pmde (in which to find pte), test present and !THP together. + */ + pmde = *pmd; + barrier(); + if (!pmd_present(pmde) || pmd_trans_huge(pmde)) + goto out; mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr, addr + PAGE_SIZE); diff --git a/mm/rmap.c b/mm/rmap.c index edc06c52bc82..af775855e58f 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -767,13 +767,17 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) return vma_address(page, vma); } +/* + * Returns the actual pmd_t* where we expect 'address' to be mapped from, or + * NULL if it doesn't exist. No guarantees / checks on what the pmd_t* + * represents. + */ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) { pgd_t *pgd; p4d_t *p4d; pud_t *pud; pmd_t *pmd = NULL; - pmd_t pmde; pgd = pgd_offset(mm, address); if (!pgd_present(*pgd)) @@ -788,15 +792,6 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) goto out; pmd = pmd_offset(pud, address); - /* - * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at() - * without holding anon_vma lock for write. So when looking for a - * genuine pmde (in which to find pte), test present and !THP together. - */ - pmde = *pmd; - barrier(); - if (!pmd_present(pmde) || pmd_trans_huge(pmde)) - pmd = NULL; out: return pmd; } From patchwork Wed Jul 6 23:59:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908933 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23342CCA480 for ; Thu, 7 Jul 2022 00:06:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 99A6A8E0009; Wed, 6 Jul 2022 20:06:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9233B8E0001; Wed, 6 Jul 2022 20:06:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7272E8E0009; Wed, 6 Jul 2022 20:06:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 54C478E0001 for ; Wed, 6 Jul 2022 20:06:22 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2905B206AC for ; Thu, 7 Jul 2022 00:06:22 +0000 (UTC) X-FDA: 79658361804.10.9761EB0 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf24.hostedemail.com (Postfix) with ESMTP id BCED718003F for ; Thu, 7 Jul 2022 00:06:21 +0000 (UTC) Received: by mail-pf1-f201.google.com with SMTP id d7-20020a056a0010c700b0052894ee16d8so2482649pfu.6 for ; Wed, 06 Jul 2022 17:06:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=FbSIKNrXiTKXBZpf/osMvXcfezTwMiY3lbG4Do0xf6Y=; b=eKwDk5+FZlVnb4XO0EqqWKl0a5+ipEJU7fLV7wkhHLuH9XjyqOCS3QO3qE7q+GQ8kO azh3sHGngdfbftASUQ45C4FTpQ+ZY3c2cbfVdPRkoCL5tKhB4DbUx8B0YXl/pg0UaSG8 wHOKnxbEggPlJ5cmd9UhWNgFXqHCBsq4HWR+iVrPZqncaabu6mjYJBPlLChhQrcAjf3/ /bJxi07q5uBIZxaag0UsFynie71hkppVU/T8wiseX85GzRSPfKQYIon5uGU7UN+nqFRy esY8ZIS6p4N53OgEjfv36c/3WQVCkx4jvBVSqCq+n0yiCPz6extaB9Jye2SptELquTaJ nmgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=FbSIKNrXiTKXBZpf/osMvXcfezTwMiY3lbG4Do0xf6Y=; b=X3konK791Pm5tGuF15a6iDTrM9Rv/y3F6XEHidILBSJNdf0L9eXReIJegTVpprWR8l OeBb7F/ME4Orrh065cY3UBTehwMNxU9hB18Szgg6StHsCcVUiepo+o61wu4lyskVZbjJ qrqGlIBNsPr6AKDvtOzfDthti1sKa6D8ReYf7GENEQ3nEus4LKeIUGpgZKuRxSNWIc6s cdGVUsK4nOki+J6LGjNe64y+KAYvNF3Yv5jbbAzPVdUZcX6SKfp6LcQxUupsD2EdokWh gp64gFd63UjgW37GKV2POLMBxe/8ZM8inOgPe7prZ6NFns8glg8hlq1dCh816HMcEMQw fpZg== X-Gm-Message-State: AJIora+S+0x7wLrAkFJaip3TBcv1TURUluJsywLVfk0cvt+iQCSdacfy ybbRXTLeEhc9zoYgcy9bsgOA25U5ZdKB X-Google-Smtp-Source: AGRyM1uR4TMMkmWUsZBt1XuBJEJ6q0cNl0OGYwFe/ZgQPByN226hj8D5ikZ37JJuNAt6QZkALxmxKgq2sSAK X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90a:249:b0:1e0:a8a3:3c6c with SMTP id t9-20020a17090a024900b001e0a8a33c6cmr17113pje.0.1657152380188; Wed, 06 Jul 2022 17:06:20 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:27 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-10-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 09/18] mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152381; a=rsa-sha256; cv=none; b=ryo1tMO6yd+ZCyHWhO6kIcuo8Sbl/d9PUsfS2ReAdfumtWRT75/ir4hbdxSUnIj8my9ABi Pqs6yEhG5pqzZE+8x+mkbXU2zoJZHYii99iJjZHTm2plEXkxyoy8jPMDRbYL9YiJj6jecb cUSBQWe8aLyLJJ3y3pFQNtu9BeRO41w= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eKwDk5+F; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of 3fCPGYgcKCPcynjddedfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3fCPGYgcKCPcynjddedfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--zokeefe.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152381; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FbSIKNrXiTKXBZpf/osMvXcfezTwMiY3lbG4Do0xf6Y=; b=vidFlU2elGTkaNYAKWdDmuMqzdmaUxfYboVxCPAmE5pFJT6bt+zm7UFNF84pphnnwyF0sO 40tYG6fDefR9wzxjd0JSd3s7WaSPt4J/YwlMp5TBZuTi1JelRKk7BrnViVwf9HQFn/ZkYs rXtKl4VvGnsX3yG8MhVRGLVy0+yVMDs= Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eKwDk5+F; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of 3fCPGYgcKCPcynjddedfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3fCPGYgcKCPcynjddedfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--zokeefe.bounces.google.com X-Stat-Signature: giskg7rukjcgwesfhqftzmn4xpd9wfdz X-Rspamd-Queue-Id: BCED718003F X-Rspamd-Server: rspam05 X-Rspam-User: X-HE-Tag: 1657152381-761211 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This idea was introduced by David Rientjes[1]. Introduce a new madvise mode, MADV_COLLAPSE, that allows users to request a synchronous collapse of memory at their own expense. The benefits of this approach are: * CPU is charged to the process that wants to spend the cycles for the THP * Avoid unpredictable timing of khugepaged collapse Semantics This call is independent of the system-wide THP sysfs settings, but will fail for memory marked VM_NOHUGEPAGE. If the ranges provided span multiple VMAs, the semantics of the collapse over each VMA is independent from the others. This implies a hugepage cannot cross a VMA boundary. If collapse of a given hugepage-aligned/sized region fails, the operation may continue to attempt collapsing the remainder of memory specified. The memory ranges provided must be page-aligned, but are not required to be hugepage-aligned. If the memory ranges are not hugepage-aligned, the start/end of the range will be clamped to the first/last hugepage-aligned address covered by said range. The memory ranges must span at least one hugepage-sized region. All non-resident pages covered by the range will first be swapped/faulted-in, before being internally copied onto a freshly allocated hugepage. Unmapped pages will have their data directly initialized to 0 in the new hugepage. However, for every eligible hugepage aligned/sized region to-be collapsed, at least one page must currently be backed by memory (a PMD covering the address range must already exist). Allocation for the new hugepage may enter direct reclaim and/or compaction, regardless of VMA flags. When the system has multiple NUMA nodes, the hugepage will be allocated from the node providing the most native pages. This operation operates on the current state of the specified process and makes no persistent changes or guarantees on how pages will be mapped, constructed, or faulted in the future Return Value If all hugepage-sized/aligned regions covered by the provided range were either successfully collapsed, or were already PMD-mapped THPs, this operation will be deemed successful. On success, process_madvise(2) returns the number of bytes advised, and madvise(2) returns 0. Else, -1 is returned and errno is set to indicate the error for the most-recently attempted hugepage collapse. Note that many failures might have occurred, since the operation may continue to collapse in the event a single hugepage-sized/aligned region fails. ENOMEM Memory allocation failed or VMA not found EBUSY Memcg charging failed EAGAIN Required resource temporarily unavailable. Try again might succeed. EINVAL Other error: No PMD found, subpage doesn't have Present bit set, "Special" page no backed by struct page, VMA incorrectly sized, address not page-aligned, ... Most notable here is ENOMEM and EBUSY (new to madvise) which are intended to provide the caller with actionable feedback so they may take an appropriate fallback measure. Use Cases An immediate user of this new functionality are malloc() implementations that manage memory in hugepage-sized chunks, but sometimes subrelease memory back to the system in native-sized chunks via MADV_DONTNEED; zapping the pmd. Later, when the memory is hot, the implementation could madvise(MADV_COLLAPSE) to re-back the memory by THPs to regain hugepage coverage and dTLB performance. TCMalloc is such an implementation that could benefit from this[2]. Only privately-mapped anon memory is supported for now, but additional support for file, shmem, and HugeTLB high-granularity mappings[2] is expected. File and tmpfs/shmem support would permit: * Backing executable text by THPs. Current support provided by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which might impair services from serving at their full rated load after (re)starting. Tricks like mremap(2)'ing text onto anonymous memory to immediately realize iTLB performance prevents page sharing and demand paging, both of which increase steady state memory footprint. With MADV_COLLAPSE, we get the best of both worlds: Peak upfront performance and lower RAM footprints. * Backing guest memory by hugapages after the memory contents have been migrated in native-page-sized chunks to a new host, in a userfaultfd-based live-migration stack. [1] https://lore.kernel.org/linux-mm/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/ [2] https://github.com/google/tcmalloc/tree/master/tcmalloc Suggested-by: David Rientjes Signed-off-by: Zach O'Keefe Reviewed-by: Yang Shi --- arch/alpha/include/uapi/asm/mman.h | 2 + arch/mips/include/uapi/asm/mman.h | 2 + arch/parisc/include/uapi/asm/mman.h | 2 + arch/xtensa/include/uapi/asm/mman.h | 2 + include/linux/huge_mm.h | 14 ++- include/uapi/asm-generic/mman-common.h | 2 + mm/khugepaged.c | 118 ++++++++++++++++++- mm/madvise.c | 5 + tools/include/uapi/asm-generic/mman-common.h | 2 + 9 files changed, 146 insertions(+), 3 deletions(-) diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index 4aa996423b0d..763929e814e9 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -76,6 +76,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index 1be428663c10..c6e1fc77c996 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -103,6 +103,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index a7ea3204a5fa..22133a6a506e 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -70,6 +70,8 @@ #define MADV_WIPEONFORK 71 /* Zero memory on fork, child only */ #define MADV_KEEPONFORK 72 /* Undo MADV_WIPEONFORK */ +#define MADV_COLLAPSE 73 /* Synchronous hugepage collapse */ + #define MADV_HWPOISON 100 /* poison a page for testing */ #define MADV_SOFT_OFFLINE 101 /* soft offline page for testing */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index 7966a58af472..1ff0c858544f 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -111,6 +111,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 00312fc251c1..39193623442e 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -218,6 +218,9 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); +int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end); void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, long adjust_next); spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); @@ -361,9 +364,16 @@ static inline void split_huge_pmd_address(struct vm_area_struct *vma, static inline int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice) { - BUG(); - return 0; + return -EINVAL; } + +static inline int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + return -EINVAL; +} + static inline void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6c1aa92a92e4..6ce1f1ceb432 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -77,6 +77,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c7a09cc9a0e8..2b2d832e44f2 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -976,7 +976,8 @@ static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, struct collapse_control *cc) { /* Only allocate from the target node */ - gfp_t gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; + gfp_t gfp = (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : + GFP_TRANSHUGE) | __GFP_THISNODE; int node = khugepaged_find_target_node(cc); if (!khugepaged_alloc_page(hpage, gfp, node)) @@ -2356,3 +2357,118 @@ void khugepaged_min_free_kbytes_update(void) set_recommended_min_free_kbytes(); mutex_unlock(&khugepaged_mutex); } + +static int madvise_collapse_errno(enum scan_result r) +{ + /* + * MADV_COLLAPSE breaks from existing madvise(2) conventions to provide + * actionable feedback to caller, so they may take an appropriate + * fallback measure depending on the nature of the failure. + */ + switch (r) { + case SCAN_ALLOC_HUGE_PAGE_FAIL: + return -ENOMEM; + case SCAN_CGROUP_CHARGE_FAIL: + return -EBUSY; + /* Resource temporary unavailable - trying again might succeed */ + case SCAN_PAGE_LOCK: + case SCAN_PAGE_LRU: + return -EAGAIN; + /* + * Other: Trying again likely not to succeed / error intrinsic to + * specified memory range. khugepaged likely won't be able to collapse + * either. + */ + default: + return -EINVAL; + } +} + +int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + struct collapse_control *cc; + struct mm_struct *mm = vma->vm_mm; + unsigned long hstart, hend, addr; + int thps = 0, last_fail = SCAN_FAIL; + bool mmap_locked = true; + + BUG_ON(vma->vm_start > start); + BUG_ON(vma->vm_end < end); + + cc = kmalloc(sizeof(*cc), GFP_KERNEL); + if (!cc) + return -ENOMEM; + cc->is_khugepaged = false; + cc->last_target_node = NUMA_NO_NODE; + + *prev = vma; + + /* TODO: Support file/shmem */ + if (!vma->anon_vma || !vma_is_anonymous(vma)) + return -EINVAL; + + hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; + hend = end & HPAGE_PMD_MASK; + + if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false)) + return -EINVAL; + + mmgrab(mm); + lru_add_drain_all(); + + for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) { + int result = SCAN_FAIL; + + if (!mmap_locked) { + cond_resched(); + mmap_read_lock(mm); + mmap_locked = true; + result = hugepage_vma_revalidate(mm, addr, &vma, cc); + if (result != SCAN_SUCCEED) { + last_fail = result; + goto out_nolock; + } + } + mmap_assert_locked(mm); + memset(cc->node_load, 0, sizeof(cc->node_load)); + result = khugepaged_scan_pmd(mm, vma, addr, &mmap_locked, cc); + if (!mmap_locked) + *prev = NULL; /* Tell caller we dropped mmap_lock */ + + switch (result) { + case SCAN_SUCCEED: + case SCAN_PMD_MAPPED: + ++thps; + break; + /* Whitelisted set of results where continuing OK */ + case SCAN_PMD_NULL: + case SCAN_PTE_NON_PRESENT: + case SCAN_PTE_UFFD_WP: + case SCAN_PAGE_RO: + case SCAN_LACK_REFERENCED_PAGE: + case SCAN_PAGE_NULL: + case SCAN_PAGE_COUNT: + case SCAN_PAGE_LOCK: + case SCAN_PAGE_COMPOUND: + case SCAN_PAGE_LRU: + last_fail = result; + break; + default: + last_fail = result; + /* Other error, exit */ + goto out_maybelock; + } + } + +out_maybelock: + /* Caller expects us to hold mmap_lock on return */ + if (!mmap_locked) + mmap_read_lock(mm); +out_nolock: + mmap_assert_locked(mm); + mmdrop(mm); + + return thps == ((hend - hstart) >> HPAGE_PMD_SHIFT) ? 0 + : madvise_collapse_errno(last_fail); +} diff --git a/mm/madvise.c b/mm/madvise.c index 851fa4e134bc..9f08e958ea86 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -59,6 +59,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_FREE: case MADV_POPULATE_READ: case MADV_POPULATE_WRITE: + case MADV_COLLAPSE: return 0; default: /* be safe, default to 1. list exceptions explicitly */ @@ -1057,6 +1058,8 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, if (error) goto out; break; + case MADV_COLLAPSE: + return madvise_collapse(vma, prev, start, end); } anon_name = anon_vma_name(vma); @@ -1150,6 +1153,7 @@ madvise_behavior_valid(int behavior) #ifdef CONFIG_TRANSPARENT_HUGEPAGE case MADV_HUGEPAGE: case MADV_NOHUGEPAGE: + case MADV_COLLAPSE: #endif case MADV_DONTDUMP: case MADV_DODUMP: @@ -1339,6 +1343,7 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * MADV_NOHUGEPAGE - mark the given range as not worth being backed by * transparent huge pages so the existing pages will not be * coalesced into THP and new pages will not be allocated as THP. + * MADV_COLLAPSE - synchronously coalesce pages into new THP. * MADV_DONTDUMP - the application wants to prevent pages in the given range * from being included in its core dump. * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump. diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/include/uapi/asm-generic/mman-common.h index 6c1aa92a92e4..6ce1f1ceb432 100644 --- a/tools/include/uapi/asm-generic/mman-common.h +++ b/tools/include/uapi/asm-generic/mman-common.h @@ -77,6 +77,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 From patchwork Wed Jul 6 23:59:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908934 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F70FC43334 for ; Thu, 7 Jul 2022 00:06:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 072568E000A; Wed, 6 Jul 2022 20:06:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F3D5C8E0001; Wed, 6 Jul 2022 20:06:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF3E38E000A; Wed, 6 Jul 2022 20:06:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B656A8E0001 for ; Wed, 6 Jul 2022 20:06:23 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 92D7B20614 for ; Thu, 7 Jul 2022 00:06:23 +0000 (UTC) X-FDA: 79658361846.24.E2686AC Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf19.hostedemail.com (Postfix) with ESMTP id 245131A0026 for ; Thu, 7 Jul 2022 00:06:22 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id k21-20020aa78215000000b005283ff3d0c3so4273412pfi.1 for ; Wed, 06 Jul 2022 17:06:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=3YJ3/oW0r+TtsHTKxqu4fhHtZgoYHtXGhei0VFjd4Ko=; b=puYC8ZEwzCddIlxsAk9ztbJS4lmhUnHGPxup06sGt7MijxQi5/xtF8QWaGsQVH9z8M VE7z9KAhFOZzbdPRbZtqiabuOpAPtgwcVJMCODJdWc4/rUhLtW8QrdmfB7AtgtdoxEes OHma8Jn2m6ai1TKeICCXuFRiXEXcnglXy526c01o5xuinZm+dUw3Gc97bPlbFevfVwaK UJXK5D4t2Za4inq5DRx5SZ3ERoYfx0Tjbia5Gq1obkvNg9TPfbMg/vWI3tCOaUdO3loY 7IuB9zwtn1PHkjTQEbA7lAs2M/SxApFRV8lW9PkD+jgn78qHN/ZW3Uf/lj3A0B9U9Ufd MDEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=3YJ3/oW0r+TtsHTKxqu4fhHtZgoYHtXGhei0VFjd4Ko=; b=6MRfzywSzqof4Noo/lI1zoh6tsCZia3X84goLWKSRb+4xXupJvHPQxNpk3omh4HM1Z Tiv5zHCTMl3FjLLtoebf9uF/HoeDTnrJSiCgnwprJ+vmC6E2qb+DoC1MdhfDqGhwUWl0 IUSPAYdi/PzSKze+chzIJ74KhHrCDTNI1pWcQXvkJDJorSF4/YM7M3ZRYigi6Ct8xxNq sowa9u/EDWUzHTBYFYdH0+GYQLFiPrnUIeZMZ43tUs578OrszGhO0VkByvZF+LBJjoo5 FeJhEh0khLGeJsbVqzSo2L3laLx8/6T2xij+MBjDFnATUta1qk6JVcQ3nFvVzhxnTo/b vIDA== X-Gm-Message-State: AJIora+Ea5TREwIR5o5qHqBIC8vg2i4/BrJujM2vkPtxr/lIeYenEf2E vOz7FblFH6Yt3X5qg5V52sY2x8/hwObb X-Google-Smtp-Source: AGRyM1tnbi2poQESktW967KSHNPaR6vKI8wQCkAyl31Do3w/h/f29A3K/UOituKCFzFdfulHNnlBTbPN2Ue0 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:c205:b0:16b:f021:4299 with SMTP id 5-20020a170902c20500b0016bf0214299mr13053381pll.70.1657152382185; Wed, 06 Jul 2022 17:06:22 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:28 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-11-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 10/18] mm/khugepaged: rename prefix of shared collapse functions From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152383; a=rsa-sha256; cv=none; b=W0ZNdIygtyQnSQ5u3zB8LwXC1jAT0cQ3z5rsspeRwKtnHSYtKowlYgnqPHAEICo1nkyVYZ 6gAa9UghWZb8y5FzN5t+98jyDO7OO5FbtEv093PO7CExBfk0lhJ2kJ8wjfqQ+2mMRP6PQK MrZUrCOSsD9BslOExcCof5oM7Bstaf8= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=puYC8ZEw; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of 3fiPGYgcKCPk0plffgfhpphmf.dpnmjovy-nnlwbdl.psh@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3fiPGYgcKCPk0plffgfhpphmf.dpnmjovy-nnlwbdl.psh@flex--zokeefe.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152383; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3YJ3/oW0r+TtsHTKxqu4fhHtZgoYHtXGhei0VFjd4Ko=; b=WqwsXrCWijcwFGOyjNzTezZA9nbNqJaKm/YW0SPGb7+BSH6G4PJv9cxH1nKkNz4nHz8hoJ ZtlabfNQgsYyTl9aW9fsbShDYiEcreZNSab6OcxprrSN2kvf6kleRBk3Pex7cfzjz0qKgd Ug7XJRTQZBwzMFrCPedJ0u+SaXfUDWY= Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=puYC8ZEw; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of 3fiPGYgcKCPk0plffgfhpphmf.dpnmjovy-nnlwbdl.psh@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3fiPGYgcKCPk0plffgfhpphmf.dpnmjovy-nnlwbdl.psh@flex--zokeefe.bounces.google.com X-Stat-Signature: mxp3cgwp899ymp8jrg7kig3f8muff357 X-Rspamd-Queue-Id: 245131A0026 X-Rspamd-Server: rspam05 X-Rspam-User: X-HE-Tag: 1657152382-730155 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The following functions are shared between khugepaged and madvise collapse contexts. Replace the "khugepaged_" prefix with generic "hpage_collapse_" prefix in such cases: khugepaged_test_exit() -> hpage_collapse_test_exit() khugepaged_scan_abort() -> hpage_collapse_scan_abort() khugepaged_scan_pmd() -> hpage_collapse_scan_pmd() khugepaged_find_target_node() -> hpage_collapse_find_target_node() khugepaged_alloc_page() -> hpage_collapse_alloc_page() The kerenel ABI (e.g. huge_memory:mm_khugepaged_scan_pmd tracepoint) is unaltered. Signed-off-by: Zach O'Keefe Reviewed-by: Yang Shi --- mm/khugepaged.c | 68 +++++++++++++++++++++++++------------------------ 1 file changed, 35 insertions(+), 33 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 2b2d832e44f2..e0d00180512c 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -94,7 +94,7 @@ struct collapse_control { /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; - /* Last target selected in khugepaged_find_target_node() */ + /* Last target selected in hpage_collapse_find_target_node() */ int last_target_node; }; @@ -438,7 +438,7 @@ static void insert_to_mm_slots_hash(struct mm_struct *mm, hash_add(mm_slots_hash, &mm_slot->hash, (long)mm); } -static inline int khugepaged_test_exit(struct mm_struct *mm) +static inline int hpage_collapse_test_exit(struct mm_struct *mm) { return atomic_read(&mm->mm_users) == 0; } @@ -453,7 +453,7 @@ void __khugepaged_enter(struct mm_struct *mm) return; /* __khugepaged_exit() must not run from under us */ - VM_BUG_ON_MM(khugepaged_test_exit(mm), mm); + VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm); if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) { free_mm_slot(mm_slot); return; @@ -505,11 +505,10 @@ void __khugepaged_exit(struct mm_struct *mm) } else if (mm_slot) { /* * This is required to serialize against - * khugepaged_test_exit() (which is guaranteed to run - * under mmap sem read mode). Stop here (after we - * return all pagetables will be destroyed) until - * khugepaged has finished working on the pagetables - * under the mmap_lock. + * hpage_collapse_test_exit() (which is guaranteed to run + * under mmap sem read mode). Stop here (after we return all + * pagetables will be destroyed) until khugepaged has finished + * working on the pagetables under the mmap_lock. */ mmap_write_lock(mm); mmap_write_unlock(mm); @@ -754,13 +753,12 @@ static void khugepaged_alloc_sleep(void) remove_wait_queue(&khugepaged_wait, &wait); } - struct collapse_control khugepaged_collapse_control = { .is_khugepaged = true, .last_target_node = NUMA_NO_NODE, }; -static bool khugepaged_scan_abort(int nid, struct collapse_control *cc) +static bool hpage_collapse_scan_abort(int nid, struct collapse_control *cc) { int i; @@ -795,7 +793,7 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void) } #ifdef CONFIG_NUMA -static int khugepaged_find_target_node(struct collapse_control *cc) +static int hpage_collapse_find_target_node(struct collapse_control *cc) { int nid, target_node = 0, max_value = 0; @@ -819,13 +817,13 @@ static int khugepaged_find_target_node(struct collapse_control *cc) return target_node; } #else -static int khugepaged_find_target_node(struct collapse_control *cc) +static int hpage_collapse_find_target_node(struct collapse_control *cc) { return 0; } #endif -static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static bool hpage_collapse_alloc_page(struct page **hpage, gfp_t gfp, int node) { *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); if (unlikely(!*hpage)) { @@ -850,7 +848,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, { struct vm_area_struct *vma; - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(hpage_collapse_test_exit(mm))) return SCAN_ANY_PROCESS; *vmap = vma = find_vma(mm, address); @@ -913,7 +911,7 @@ static int check_pmd_still_valid(struct mm_struct *mm, /* * Bring missing pages in from swap, to complete THP collapse. - * Only done if khugepaged_scan_pmd believes it is worthwhile. + * Only done if hpage_collapse_scan_pmd believes it is worthwhile. * * Called and returns without pte mapped or spinlocks held. * Note that if false is returned, mmap_lock will be released. @@ -978,9 +976,9 @@ static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, /* Only allocate from the target node */ gfp_t gfp = (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : GFP_TRANSHUGE) | __GFP_THISNODE; - int node = khugepaged_find_target_node(cc); + int node = hpage_collapse_find_target_node(cc); - if (!khugepaged_alloc_page(hpage, gfp, node)) + if (!hpage_collapse_alloc_page(hpage, gfp, node)) return SCAN_ALLOC_HUGE_PAGE_FAIL; if (unlikely(mem_cgroup_charge(page_folio(*hpage), mm, gfp))) return SCAN_CGROUP_CHARGE_FAIL; @@ -1140,9 +1138,10 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, return result; } -static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long address, bool *mmap_locked, - struct collapse_control *cc) +static int hpage_collapse_scan_pmd(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, bool *mmap_locked, + struct collapse_control *cc) { pmd_t *pmd; pte_t *pte, *_pte; @@ -1234,7 +1233,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, * hit record. */ node = page_to_nid(page); - if (khugepaged_scan_abort(node, cc)) { + if (hpage_collapse_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; goto out_unmap; } @@ -1313,7 +1312,7 @@ static void collect_mm_slot(struct mm_slot *mm_slot) lockdep_assert_held(&khugepaged_mm_lock); - if (khugepaged_test_exit(mm)) { + if (hpage_collapse_test_exit(mm)) { /* free mm_slot */ hash_del(&mm_slot->hash); list_del(&mm_slot->mm_node); @@ -1486,7 +1485,7 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) if (!mmap_write_trylock(mm)) return; - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(hpage_collapse_test_exit(mm))) goto out; for (i = 0; i < mm_slot->nr_pte_mapped_thp; i++) @@ -1548,7 +1547,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * it'll always mapped in small page size for uffd-wp * registered ranges. */ - if (!khugepaged_test_exit(mm) && !userfaultfd_wp(vma)) + if (!hpage_collapse_test_exit(mm) && + !userfaultfd_wp(vma)) collapse_and_free_pmd(mm, vma, addr, pmd); mmap_write_unlock(mm); } else { @@ -1975,7 +1975,7 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, } node = page_to_nid(page); - if (khugepaged_scan_abort(node, cc)) { + if (hpage_collapse_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; break; } @@ -2069,7 +2069,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, goto breakouterloop_mmap_lock; progress++; - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(hpage_collapse_test_exit(mm))) goto breakouterloop; address = khugepaged_scan.address; @@ -2078,7 +2078,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, unsigned long hstart, hend; cond_resched(); - if (unlikely(khugepaged_test_exit(mm))) { + if (unlikely(hpage_collapse_test_exit(mm))) { progress++; break; } @@ -2099,7 +2099,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, bool mmap_locked = true; cond_resched(); - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(hpage_collapse_test_exit(mm))) goto breakouterloop; VM_BUG_ON(khugepaged_scan.address < hstart || @@ -2116,9 +2116,10 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, mmap_locked = false; fput(file); } else { - *result = khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, - &mmap_locked, cc); + *result = hpage_collapse_scan_pmd(mm, vma, + khugepaged_scan.address, + &mmap_locked, + cc); } if (*result == SCAN_SUCCEED) ++khugepaged_pages_collapsed; @@ -2148,7 +2149,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, * Release the current mm_slot if this mm is about to die, or * if we scanned all vmas of this mm. */ - if (khugepaged_test_exit(mm) || !vma) { + if (hpage_collapse_test_exit(mm) || !vma) { /* * Make sure that if mm_users is reaching zero while * khugepaged runs here, khugepaged_exit will find @@ -2432,7 +2433,8 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, } mmap_assert_locked(mm); memset(cc->node_load, 0, sizeof(cc->node_load)); - result = khugepaged_scan_pmd(mm, vma, addr, &mmap_locked, cc); + result = hpage_collapse_scan_pmd(mm, vma, addr, &mmap_locked, + cc); if (!mmap_locked) *prev = NULL; /* Tell caller we dropped mmap_lock */ From patchwork Wed Jul 6 23:59:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908935 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA955C43334 for ; Thu, 7 Jul 2022 00:06:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CF8AC8E000B; Wed, 6 Jul 2022 20:06:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C2F798E0001; Wed, 6 Jul 2022 20:06:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AAA258E000B; Wed, 6 Jul 2022 20:06:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 8D0348E0001 for ; Wed, 6 Jul 2022 20:06:25 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 5AF4480497 for ; Thu, 7 Jul 2022 00:06:25 +0000 (UTC) X-FDA: 79658361930.18.8F86A24 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf06.hostedemail.com (Postfix) with ESMTP id EEA6F18000F for ; Thu, 7 Jul 2022 00:06:24 +0000 (UTC) Received: by mail-pg1-f201.google.com with SMTP id s129-20020a632c87000000b00411564fe1feso6557131pgs.7 for ; Wed, 06 Jul 2022 17:06:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=9iaSDpn/v88iospRdmpPQ35IyRHiiBJVFeNpkZqtU1E=; b=cUosVoQHPAQYiOU7lEg06ydiDA3DJLWHbxdv33JJtc6RAzFEfDFuR4IuYTMAWnxYxv 48lq7DT2Vbim9t5oUQCJBLPp7zYrAHt2TEoia1R1zpVnyMHioqM02n2RlMBeAg4rnndg pb8zeA2bULolFWDxIF9MzWsnTz2RKmqFRSTGw6juCOjPZYxifzRDl9z4ip7UMU4+rnTs t0E05KBAhuHKgNilOCxkbi4Hao0pHBphAj2nEhM8nSUnOjY6fyaQVM7JxzcVMNXTp4/8 hWreRkZ9GL67LIIHGWQ+504QJymmYWmgPbW0Qva2nzg9ppjLLhlawOPFT38DyQWUOi7Q 2ptA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=9iaSDpn/v88iospRdmpPQ35IyRHiiBJVFeNpkZqtU1E=; b=eS49q4eeWXUCKOFCKZIEc3fCEKl4zGVuhJaoIySethwMRYefXQuH0ZJgWlrexwmtsn PuqNr99vLECVg4QGCR/8b8v7qI141RXhqYhgDQWwXjiGVHpj3AThm7Mg+TOb+Etv9Tcz wgyp6tYm8W33Km39Py3NGAVrNch87p8/RKSnGsyoZzf8IN7RKzZ5/8Ev1NGkqrjxOzTR eYXQROdXD2Wel3iuDTYgLzqFRX3yA00WAPVqa5+cdOd+bzdCKVPxDyurOU4HSoACO+7/ Jz1NHzjXTdu8JAgg+4wbAn2/fz10nqXPMR3A5+JYH0E/nmp2vObf2Tn3NQalsiBDgMnp WFRQ== X-Gm-Message-State: AJIora9+Tb5Szz2A8c0NM5nTEaRUjC/HBnAS8rld8L+M6opsmObIA0xK Ng8UGpaapfU9Dxe9pJHq3hXGmDfQxSXN X-Google-Smtp-Source: AGRyM1vD6qxE+V7dy75eF/SQqy8q7rwumvZVZUmd7JVi84rG2AREmyU22MkUr/B3gzCiVFTuklf5jWYUTFna X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:174b:b0:525:4eea:8ff2 with SMTP id j11-20020a056a00174b00b005254eea8ff2mr49130204pfc.23.1657152384031; Wed, 06 Jul 2022 17:06:24 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:29 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-12-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 11/18] mm/madvise: add huge_memory:mm_madvise_collapse tracepoint From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152385; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9iaSDpn/v88iospRdmpPQ35IyRHiiBJVFeNpkZqtU1E=; b=jw5tUc39sjhMJS8gVunPMCoXZ4jAKSX6Oio87LOAebhQD1JYRHhm3Tvtxdo2KvfIerg9bn PeXh+kyQXgtSvxDvk0ZOzSOyOEVARGY5pATRUqBO5ib33ViAVPU6aeNmthnHsFkhGmU7cy Gp63Ed/5b4W7l6wRIzRGnTHEEUbf+PE= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=cUosVoQH; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf06.hostedemail.com: domain of 3gCPGYgcKCPs2rnhhihjrrjoh.frpolqx0-ppnydfn.ruj@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3gCPGYgcKCPs2rnhhihjrrjoh.frpolqx0-ppnydfn.ruj@flex--zokeefe.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152385; a=rsa-sha256; cv=none; b=nHIkgZYDmF3fKqIb2GPkYdzmTBmeg/jsCsXKoYr/Lym7k4AhiilbBezkKvLpoB+NYJldgv hkQLxff9lIYFxLuRt+lDq5e++cBTb1Y5jStrAFZRZMBdkyH9aJPxz4mtRblRQCnkP9I56C BHaJLB3IvPtQTXo9+dwOeEXXdxUEorc= X-Rspamd-Server: rspam11 X-Rspam-User: X-Stat-Signature: 3dgq8e9pzhog9bzbyixn775z8d9pkdne X-Rspamd-Queue-Id: EEA6F18000F Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=cUosVoQH; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf06.hostedemail.com: domain of 3gCPGYgcKCPs2rnhhihjrrjoh.frpolqx0-ppnydfn.ruj@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3gCPGYgcKCPs2rnhhihjrrjoh.frpolqx0-ppnydfn.ruj@flex--zokeefe.bounces.google.com X-HE-Tag: 1657152384-335875 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a tracepoint to expose mm, address, and enum scan_result of each hugepage attempted to be collapsed by call to madvise(MADV_COLLAPSE). Signed-off-by: Zach O'Keefe --- include/trace/events/huge_memory.h | 22 ++++++++++++++++++++++ mm/khugepaged.c | 2 ++ 2 files changed, 24 insertions(+) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 55392bf30a03..38d339ffdb16 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -167,5 +167,27 @@ TRACE_EVENT(mm_collapse_huge_page_swapin, __entry->ret) ); +TRACE_EVENT(mm_madvise_collapse, + + TP_PROTO(struct mm_struct *mm, unsigned long addr, int result), + + TP_ARGS(mm, addr, result), + + TP_STRUCT__entry(__field(struct mm_struct *, mm) + __field(unsigned long, addr) + __field(int, result) + ), + + TP_fast_assign(__entry->mm = mm; + __entry->addr = addr; + __entry->result = result; + ), + + TP_printk("mm=%p addr=%#lx result=%s", + __entry->mm, + __entry->addr, + __print_symbolic(__entry->result, SCAN_STATUS)) +); + #endif /* __HUGE_MEMORY_H */ #include diff --git a/mm/khugepaged.c b/mm/khugepaged.c index e0d00180512c..0207fc0a5b2a 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2438,6 +2438,8 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, if (!mmap_locked) *prev = NULL; /* Tell caller we dropped mmap_lock */ + trace_mm_madvise_collapse(mm, addr, result); + switch (result) { case SCAN_SUCCEED: case SCAN_PMD_MAPPED: From patchwork Wed Jul 6 23:59:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908936 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92161C43334 for ; Thu, 7 Jul 2022 00:06:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37EDA8E000C; Wed, 6 Jul 2022 20:06:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 292478E0001; Wed, 6 Jul 2022 20:06:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E5288E000C; Wed, 6 Jul 2022 20:06:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E858F8E0001 for ; Wed, 6 Jul 2022 20:06:26 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A9F3A2066D for ; Thu, 7 Jul 2022 00:06:26 +0000 (UTC) X-FDA: 79658361972.18.9C1F8D0 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf17.hostedemail.com (Postfix) with ESMTP id 4C97F40029 for ; Thu, 7 Jul 2022 00:06:26 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id h5-20020a62b405000000b00528c76085e4so368453pfn.15 for ; Wed, 06 Jul 2022 17:06:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=akPXti7Jps/1XC2wL/0eQ+7YVYnKnCRrc4HJoC/suJY=; b=qFDcV4vmHToswl+yHgVdKHHZGLyRhipVUmciws/xx7aFgRL4MOQYSk00tU8pJsL0Y2 3A9QJCnvzvXyO8QltTQOeqL/3ZqQdCfVW7CWwn7o/KU+/dyVb3FWKlB0fTyDQAQHMGgr PP1LRCK935TDvYokPeYC06e2Vu3mlHLUFqzxgyvg9q0COoqe9lxp4ZA4oNq17DF5S8bI yVDuj7VnAKhOzxMb2ABuFX3Rw73AQZ0gjP8qc4TtN08/OpbqmGndAi0S2YdShiJ18Tgn ogv0KXASLL63pBTmyA7a3eQQr3dUpHcsbN0qxq4HkQ31y+qK31FDw05/bZdwSdfzxXRt hzUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=akPXti7Jps/1XC2wL/0eQ+7YVYnKnCRrc4HJoC/suJY=; b=prBoMUphOoif2pzL+Y+qLeJ6rXsKs1wuYKgt+ZStXJGQyLG+3iIq43cIGmujpWMCjW JlMRsOyQ7Rs6h4IzbUbOz4V8TxM1X6RPHHoqLWVMaORTQAPnt8GP70+nA7toz4wKClxS lsiB+Zhy12iZ1iXhcju9lrDzQMivN8t+9Zw0zTRDzrNJPCRHkdz1ZgAN3r77W0gX8JOS FRt7BK8P6L3ZrvOO2FimYhMCpEPp84BDnW6Wt2aDWjkCJMI0L/LPvoW7JewLGDbqw+Ei n1beXz59AavlUBUkATQOWce28tkfs6+DNfQa6S8YHETaiFXSJ/dfVpOpmtthpdrPN3jb ZFnw== X-Gm-Message-State: AJIora+cok0hqDJVweh5HDypO9ZfekC7cUezay9ZsgNzMFtdmZayl6Jj mOM9n4eQxRanK9gAQl5HbBhGQJN5SfqJ X-Google-Smtp-Source: AGRyM1uh9/DmUjm74fPzN5piQB57xhrlgvcvR3mOptD5R00k+v56VyxBC5rS+zsEDOza+LJDbO4VzXIxBZrC X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:d142:b0:16b:f014:9e6a with SMTP id t2-20020a170902d14200b0016bf0149e6amr12997063plt.64.1657152385810; Wed, 06 Jul 2022 17:06:25 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:30 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-13-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 12/18] mm/madvise: add MADV_COLLAPSE to process_madvise() From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152386; a=rsa-sha256; cv=none; b=jbnbLFUswjpLI+xndRqddwBZ8aG3yG11QdAArrDqfGMYXsxT6dSFPccEqp9SnW2HmPfB/k ZnTjxbWpUhnWxqqQpfX+2ZI0b26xfdfZ77faGfs9qy+RGVBukj2Uu4Lzave4F8fNilfrsd jy/w1SQoOoQQUa8pw528z3ELBmkIyzg= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=qFDcV4vm; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 3gSPGYgcKCPw3soiijiksskpi.gsqpmry1-qqozego.svk@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3gSPGYgcKCPw3soiijiksskpi.gsqpmry1-qqozego.svk@flex--zokeefe.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152386; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=akPXti7Jps/1XC2wL/0eQ+7YVYnKnCRrc4HJoC/suJY=; b=P6p5zr8NI0KA1yMfR/LwPSWoPD+RYgpCnjsdKckOQohkI8MCE/SjZJ4i2gg4v3ds9ZylAx llMYQ611gKedOLAriG6AN+sIFO46PDS2Bj0pHfHMv65PBd3iOtwwDzCY8UGzEkikL4rbXU qqOjlsdLdxOZfmooCfGFT64Wfyh8xgw= X-Rspamd-Server: rspam04 X-Rspam-User: Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=qFDcV4vm; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 3gSPGYgcKCPw3soiijiksskpi.gsqpmry1-qqozego.svk@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3gSPGYgcKCPw3soiijiksskpi.gsqpmry1-qqozego.svk@flex--zokeefe.bounces.google.com X-Stat-Signature: oyrkbyan6ubime4tck75556q9knafybj X-Rspamd-Queue-Id: 4C97F40029 X-HE-Tag: 1657152386-272860 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Allow MADV_COLLAPSE behavior for process_madvise(2) if caller has CAP_SYS_ADMIN or is requesting collapse of it's own memory. This is useful for the development of userspace agents that seek to optimize THP utilization system-wide by using userspace signals to prioritize what memory is most deserving of being THP-backed. Signed-off-by: Zach O'Keefe Acked-by: David Rientjes --- mm/madvise.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 9f08e958ea86..6fb6b7160bda 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1171,13 +1171,15 @@ madvise_behavior_valid(int behavior) } static bool -process_madvise_behavior_valid(int behavior) +process_madvise_behavior_valid(int behavior, struct task_struct *task) { switch (behavior) { case MADV_COLD: case MADV_PAGEOUT: case MADV_WILLNEED: return true; + case MADV_COLLAPSE: + return task == current || capable(CAP_SYS_ADMIN); default: return false; } @@ -1455,7 +1457,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, goto free_iov; } - if (!process_madvise_behavior_valid(behavior)) { + if (!process_madvise_behavior_valid(behavior, task)) { ret = -EINVAL; goto release_task; } From patchwork Wed Jul 6 23:59:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908937 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A7CDC433EF for ; Thu, 7 Jul 2022 00:06:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF0C28E000D; Wed, 6 Jul 2022 20:06:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E79208E0001; Wed, 6 Jul 2022 20:06:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CA5268E000D; Wed, 6 Jul 2022 20:06:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B47BF8E0001 for ; Wed, 6 Jul 2022 20:06:28 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8E1D12066D for ; Thu, 7 Jul 2022 00:06:28 +0000 (UTC) X-FDA: 79658362056.07.940F336 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf12.hostedemail.com (Postfix) with ESMTP id 43B3D4002C for ; Thu, 7 Jul 2022 00:06:28 +0000 (UTC) Received: by mail-pg1-f201.google.com with SMTP id u24-20020a63d358000000b004119798494fso6605571pgi.18 for ; Wed, 06 Jul 2022 17:06:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=H1jTBrV7K1aDNnJld1B/5IbYUIxSMosrS1B/voKPgmk=; b=Ng9+2i4u4cBH4LW2gBnwcajRPYmU/JXDk06tJU6Y3fHhh+DmRE++ip8u/7e4/mlYfu 2sx4aBzCKcZ2CCPcCLhvXSkgO7FDmOgGCzI0DBvlJyZnZxneGgR43UFTcnyn1I5/iQTs 6g/hoVJB/zyvWGlj/NPKzDnsya7uAdaXxOiM4y8zkuFiREwL3NJl2RFJ54aRiQ9id8Lb tBjdv0NB/aBCcfcwK3b/ODBqB3NCA+A7Eme6qSo8AKTeXE5EgqAHZetO4cRmUds1Wj6D fYxQ2HijJ8NjeBaueF2W4M9+KwEcBnhgXctZJbb+wBdMkmWhwnEBCiqt5YmHEz2TWiqx 8vUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=H1jTBrV7K1aDNnJld1B/5IbYUIxSMosrS1B/voKPgmk=; b=tgEH5fdLlhV7Xd6z0bj7qUr1ZVjXHRGmceVk1JZq5XkDkKgMr68x+JwKWgGKjlu90a sC+tRHjR1e9L25UteEoVtDNuhet9ZBaZooD2zWlXMbcvuYSQkwUF/17sBe3y4fXw5dlf NIlBTKJl/t8BxgHsLnTWgt4BJAPHE2jxEX550ivc48sILViJqN/h6DUQkCQfT1WJJ0cE 7XjjEpEHtTX81rT11vYLrvBU6WYE8HKildBvD+Kk0vqs/tD7cm9thxayOTJro4gVWy9F D1c2A6D8iHOOcnimxx+9CcPyXd9029vQ0gBweFfA7Wd2ZOhe7ub/NavVazvN/0tgJuL4 lg1w== X-Gm-Message-State: AJIora/3c/KydO8CviAJeteWFl9jRZvZispuJ3u9USxThUJMSrVB//wE jKkLsOkmJNjcpAiJQDQ4W2R4ZkY5dUko X-Google-Smtp-Source: AGRyM1s6B1F1YtkAop8Y1ZSpsZLOHKn+cXzqPNxYjWvNEbZquiOYdleLDzQ1FBZmwam+TCzc92K8/7jYaROj X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:e80a:b0:16a:311a:a672 with SMTP id u10-20020a170902e80a00b0016a311aa672mr49849362plg.161.1657152387341; Wed, 06 Jul 2022 17:06:27 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:31 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-14-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 13/18] proc/smaps: add PMDMappable field to smaps From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152388; a=rsa-sha256; cv=none; b=T7WJCHXNguJXIdSLwzA4qpfhpQWYtk0NHyPXSvWAaq+7D+g/FmFJw5N5etg5Ae2SHnGMJ8 sV4kzPLhOuoEh1VHxXtTtGPrm2c2sm96Kc0/0cAyqZjR1u0EJE16a89/2c6GWoiC79crrE oalMDRwI1D9TZm1u6hwIBKRC1N6WiGk= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Ng9+2i4u; spf=pass (imf12.hostedemail.com: domain of 3gyPGYgcKCP45uqkklkmuumrk.iusrot03-ssq1giq.uxm@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3gyPGYgcKCP45uqkklkmuumrk.iusrot03-ssq1giq.uxm@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152388; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=H1jTBrV7K1aDNnJld1B/5IbYUIxSMosrS1B/voKPgmk=; b=Tk442k4pty0CGG8QsLjyedW/8axDC19wzswZrGqlq/iDJH1UIMiDtimoTxxiKLoX9REQH5 4C/cJySCTc3FzC6cBcllLK31SYHpGaXVC+qIeFa9240bn2XDpGFo7sloAwOjf+Ac/3e6CM KjIE+V5A3BIr3bn7OFkR2hYH/82nXw0= Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Ng9+2i4u; spf=pass (imf12.hostedemail.com: domain of 3gyPGYgcKCP45uqkklkmuumrk.iusrot03-ssq1giq.uxm@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3gyPGYgcKCP45uqkklkmuumrk.iusrot03-ssq1giq.uxm@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: yqbmrgan4198mgqsa14mzkghiydmy8mz X-Rspamd-Queue-Id: 43B3D4002C X-HE-Tag: 1657152388-741875 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add PMDMappable field to smaps output which informs the user if memory in the VMA can be PMD-mapped by MADV_COLLAPSE. The distinction from THPeligible is needed for two reasons: 1) For THP, MADV_COLLAPSE is not coupled to THP sysfs controls, which THPeligible reports. 2) PMDMappable can also be used in HugeTLB fine-granularity mappings, which are independent from THP. Signed-off-by: Zach O'Keefe --- Documentation/filesystems/proc.rst | 10 ++++++++-- fs/proc/task_mmu.c | 2 ++ 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 47e95dbc820d..f207903a57a5 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -466,6 +466,7 @@ Memory Area, or VMA) there is a series of lines such as the following:: MMUPageSize: 4 kB Locked: 0 kB THPeligible: 0 + PMDMappable: 0 VmFlags: rd ex mr mw me dw The first of these lines shows the same information as is displayed for the @@ -518,9 +519,14 @@ replaced by copy-on-write) part of the underlying shmem object out on swap. does not take into account swapped out page of underlying shmem objects. "Locked" indicates whether the mapping is locked in memory or not. +"PMDMappable" indicates if the memory can be mapped by PMDs - 1 if true, 0 +otherwise. It just shows the current status. Note that this is memory +operable on explicitly by MADV_COLLAPSE. + "THPeligible" indicates whether the mapping is eligible for allocating THP -pages as well as the THP is PMD mappable or not - 1 if true, 0 otherwise. -It just shows the current status. +pages by the kernel, as well as the THP is PMD mappable or not - 1 if true, 0 +otherwise. It just shows the current status. Note this is memory the kernel can +transparently provide as THPs. "VmFlags" field deserves a separate description. This member represents the kernel flags associated with the particular virtual memory area in two letter diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f8cd58846a28..29f2089456ba 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -867,6 +867,8 @@ static int show_smap(struct seq_file *m, void *v) seq_printf(m, "THPeligible: %d\n", hugepage_vma_check(vma, vma->vm_flags, true, false, true)); + seq_printf(m, "PMDMappable: %d\n", + hugepage_vma_check(vma, vma->vm_flags, true, false, false)); if (arch_pkeys_enabled()) seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); From patchwork Wed Jul 6 23:59:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908938 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41882C433EF for ; Thu, 7 Jul 2022 00:06:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC9888E000E; Wed, 6 Jul 2022 20:06:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E52478E0001; Wed, 6 Jul 2022 20:06:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C57578E000E; Wed, 6 Jul 2022 20:06:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AAE328E0001 for ; Wed, 6 Jul 2022 20:06:30 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 83ED1204AF for ; Thu, 7 Jul 2022 00:06:30 +0000 (UTC) X-FDA: 79658362140.20.D2C971A Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf24.hostedemail.com (Postfix) with ESMTP id 2B7D418000B for ; Thu, 7 Jul 2022 00:06:29 +0000 (UTC) Received: by mail-pg1-f201.google.com with SMTP id 11-20020a63070b000000b00412b2e755d5so472888pgh.19 for ; Wed, 06 Jul 2022 17:06:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=GpwOsMs/zSmEDZBEmaos8HiR4P/IXCATPGigAZlM1es=; b=QI1lm7qQMQFBakBYvs//H1tmJOBAVoo6nn6mJKV/8TA2+SsExh4KswrLHGA3TAK4E5 CM/EVQaxC1JhOiL6UZyG8xplbSSiDn+SRB9rXjn1porhAIWnGq6FMmf8MUNkoyBBb/Qt PUiKLNLH45Zzxr1DaPBuEcaCaJAe8mh/AZ5KLiFPm/ostY61sADY17A9vXzSm+cyfp8w dVwwfDLloOMHBvGbVebZPZKyQYd4JNea5KefWvld17hnNEYwybR5vzvfPn7bRjFb7+Ga a8uT7S5AIeL+3YqouX5GFL4I5UPd96rYELRbDCgLESFcEfIBS9/f5p4GlCZ0xGx0DSiF Fi4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=GpwOsMs/zSmEDZBEmaos8HiR4P/IXCATPGigAZlM1es=; b=OrcATZL0ldsPVM8Xr1t68fbC5h98x/r/n2lXqMqYnk9pFla1HbsbMqAPHim63tiUYn +RdDuqXy8HADH+ynUzTF7iPyLfIWW5uRhwp6A5a1Q0AeQ65H+nb0k8fMLsWyGVkSB/RJ 7urdT6qKCisAM2mnM1JoUzGQbIvsq2oaIswASx6pdYiUBrEYWrHGMZEfQbD0ERqOUu8z dU8MFNYVKd0Ddl4IOSGeZkxDuRmwrsR085z0Zr+MoMyxA5ojY/WbHekNRuHYoQyHhKPp rbv0TlgeRKLrz6MzMA6RVy3aZ+ozcMOlsU8MmhNv9d0pwaIeQcObUgV1E2SpBZOgUFJ0 PczA== X-Gm-Message-State: AJIora+bRwyFkFys2cfxH4hACcl/JdhqLBUpFVWR8eLz8ssK07IQZWZW J8mOhV6qztfRtwxuvzWpUcQt5Swi4skT X-Google-Smtp-Source: AGRyM1v0p/5hwasm56kmmSyAFUgIoF7neQip9W8o4xwhTA3O7nzWQZQ5Kdp9QAPSjowWTjmilY1mHjNZ1X4T X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a63:6bc4:0:b0:40d:fd98:bb21 with SMTP id g187-20020a636bc4000000b0040dfd98bb21mr35924353pgc.249.1657152389184; Wed, 06 Jul 2022 17:06:29 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:32 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-15-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 14/18] selftests/vm: modularize collapse selftests From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152390; a=rsa-sha256; cv=none; b=lsVHJkOnB9a2tnQqua5wXzaPcnRYnBPRKKGj1m3InMgisSfHWWl1dwkDdeuTcFINTz6cKO Ss2ZlP/j6Z1tW16OaYscrkvCWxLuZaeymuW0HsSB/90RerdQ07yCzJZlymOlGTmaMd5EzA /PW+mOPmWsnc7cYmiJtLH/AtqtpN59A= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=QI1lm7qQ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of 3hSPGYgcKCAI1qmgghgiqqing.eqonkpwz-oomxcem.qti@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3hSPGYgcKCAI1qmgghgiqqing.eqonkpwz-oomxcem.qti@flex--zokeefe.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152390; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GpwOsMs/zSmEDZBEmaos8HiR4P/IXCATPGigAZlM1es=; b=C0mV17+EhtnLCqctXiHnXFGZtQ5G7hNwHQYfYw9YcVaEta2As2XsASNNV8724ybMj1r+bV H48LgzYC/rZvuDnieH5+wVLY3m7QHlSPe3vWJbDEuWhV+pewICS4ZzxwWy0K3SNVCL2VHU FHK0XJSSVrWJE/xXiS6atGA9To0Ydqg= X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2B7D418000B X-Rspam-User: Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=QI1lm7qQ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of 3hSPGYgcKCAI1qmgghgiqqing.eqonkpwz-oomxcem.qti@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3hSPGYgcKCAI1qmgghgiqqing.eqonkpwz-oomxcem.qti@flex--zokeefe.bounces.google.com X-Stat-Signature: fcgoh63pxw8i4qfmjgaiy5wfrcj7b3ux X-HE-Tag: 1657152389-147102 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Modularize the collapse action of khugepaged collapse selftests by introducing a struct collapse_context which specifies how to collapse a given memory range and the expected semantics of the collapse. This can be reused later to test other collapse contexts. Additionally, all tests have logic that checks if a collapse occurred via reading /proc/self/smaps, and report if this is different than expected. Move this logic into the per-context ->collapse() hook instead of repeating it in every test. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 251 +++++++++++------------- 1 file changed, 110 insertions(+), 141 deletions(-) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index 155120b67a16..0f1bee0eff24 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -23,6 +23,11 @@ static int hpage_pmd_nr; #define THP_SYSFS "/sys/kernel/mm/transparent_hugepage/" #define PID_SMAPS "/proc/self/smaps" +struct collapse_context { + void (*collapse)(const char *msg, char *p, bool expect); + bool enforce_pte_scan_limits; +}; + enum thp_enabled { THP_ALWAYS, THP_MADVISE, @@ -501,6 +506,21 @@ static bool wait_for_scan(const char *msg, char *p) return timeout == -1; } +static void khugepaged_collapse(const char *msg, char *p, bool expect) +{ + if (wait_for_scan(msg, p)) { + if (expect) + fail("Timeout"); + else + success("OK"); + return; + } else if (check_huge(p) == expect) { + success("OK"); + } else { + fail("Fail"); + } +} + static void alloc_at_fault(void) { struct settings settings = default_settings; @@ -528,53 +548,39 @@ static void alloc_at_fault(void) munmap(p, hpage_pmd_size); } -static void collapse_full(void) +static void collapse_full(struct collapse_context *c) { void *p; p = alloc_mapping(); fill_memory(p, 0, hpage_pmd_size); - if (wait_for_scan("Collapse fully populated PTE table", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + c->collapse("Collapse fully populated PTE table", p, true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } -static void collapse_empty(void) +static void collapse_empty(struct collapse_context *c) { void *p; p = alloc_mapping(); - if (wait_for_scan("Do not collapse empty PTE table", p)) - fail("Timeout"); - else if (check_huge(p)) - fail("Fail"); - else - success("OK"); + c->collapse("Do not collapse empty PTE table", p, false); munmap(p, hpage_pmd_size); } -static void collapse_single_pte_entry(void) +static void collapse_single_pte_entry(struct collapse_context *c) { void *p; p = alloc_mapping(); fill_memory(p, 0, page_size); - if (wait_for_scan("Collapse PTE table with single PTE entry present", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + c->collapse("Collapse PTE table with single PTE entry present", p, + true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); } -static void collapse_max_ptes_none(void) +static void collapse_max_ptes_none(struct collapse_context *c) { int max_ptes_none = hpage_pmd_nr / 2; struct settings settings = default_settings; @@ -586,28 +592,22 @@ static void collapse_max_ptes_none(void) p = alloc_mapping(); fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); - if (wait_for_scan("Do not collapse with max_ptes_none exceeded", p)) - fail("Timeout"); - else if (check_huge(p)) - fail("Fail"); - else - success("OK"); + c->collapse("Maybe collapse with max_ptes_none exceeded", p, + !c->enforce_pte_scan_limits); validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); - fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); - if (wait_for_scan("Collapse with max_ptes_none PTEs empty", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); - validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); + if (c->enforce_pte_scan_limits) { + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); + c->collapse("Collapse with max_ptes_none PTEs empty", p, true); + validate_memory(p, 0, + (hpage_pmd_nr - max_ptes_none) * page_size); + } munmap(p, hpage_pmd_size); write_settings(&default_settings); } -static void collapse_swapin_single_pte(void) +static void collapse_swapin_single_pte(struct collapse_context *c) { void *p; p = alloc_mapping(); @@ -625,18 +625,13 @@ static void collapse_swapin_single_pte(void) goto out; } - if (wait_for_scan("Collapse with swapping in single PTE entry", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + c->collapse("Collapse with swapping in single PTE entry", p, true); validate_memory(p, 0, hpage_pmd_size); out: munmap(p, hpage_pmd_size); } -static void collapse_max_ptes_swap(void) +static void collapse_max_ptes_swap(struct collapse_context *c) { int max_ptes_swap = read_num("khugepaged/max_ptes_swap"); void *p; @@ -656,39 +651,34 @@ static void collapse_max_ptes_swap(void) goto out; } - if (wait_for_scan("Do not collapse with max_ptes_swap exceeded", p)) - fail("Timeout"); - else if (check_huge(p)) - fail("Fail"); - else - success("OK"); + c->collapse("Maybe collapse with max_ptes_swap exceeded", p, + !c->enforce_pte_scan_limits); validate_memory(p, 0, hpage_pmd_size); - fill_memory(p, 0, hpage_pmd_size); - printf("Swapout %d of %d pages...", max_ptes_swap, hpage_pmd_nr); - if (madvise(p, max_ptes_swap * page_size, MADV_PAGEOUT)) { - perror("madvise(MADV_PAGEOUT)"); - exit(EXIT_FAILURE); - } - if (check_swap(p, max_ptes_swap * page_size)) { - success("OK"); - } else { - fail("Fail"); - goto out; - } + if (c->enforce_pte_scan_limits) { + fill_memory(p, 0, hpage_pmd_size); + printf("Swapout %d of %d pages...", max_ptes_swap, + hpage_pmd_nr); + if (madvise(p, max_ptes_swap * page_size, MADV_PAGEOUT)) { + perror("madvise(MADV_PAGEOUT)"); + exit(EXIT_FAILURE); + } + if (check_swap(p, max_ptes_swap * page_size)) { + success("OK"); + } else { + fail("Fail"); + goto out; + } - if (wait_for_scan("Collapse with max_ptes_swap pages swapped out", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); - validate_memory(p, 0, hpage_pmd_size); + c->collapse("Collapse with max_ptes_swap pages swapped out", p, + true); + validate_memory(p, 0, hpage_pmd_size); + } out: munmap(p, hpage_pmd_size); } -static void collapse_single_pte_entry_compound(void) +static void collapse_single_pte_entry_compound(struct collapse_context *c) { void *p; @@ -710,17 +700,13 @@ static void collapse_single_pte_entry_compound(void) else fail("Fail"); - if (wait_for_scan("Collapse PTE table with single PTE mapping compound page", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + c->collapse("Collapse PTE table with single PTE mapping compound page", + p, true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); } -static void collapse_full_of_compound(void) +static void collapse_full_of_compound(struct collapse_context *c) { void *p; @@ -742,17 +728,12 @@ static void collapse_full_of_compound(void) else fail("Fail"); - if (wait_for_scan("Collapse PTE table full of compound pages", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + c->collapse("Collapse PTE table full of compound pages", p, true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } -static void collapse_compound_extreme(void) +static void collapse_compound_extreme(struct collapse_context *c) { void *p; int i; @@ -798,18 +779,14 @@ static void collapse_compound_extreme(void) else fail("Fail"); - if (wait_for_scan("Collapse PTE table full of different compound pages", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + c->collapse("Collapse PTE table full of different compound pages", p, + true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } -static void collapse_fork(void) +static void collapse_fork(struct collapse_context *c) { int wstatus; void *p; @@ -835,13 +812,8 @@ static void collapse_fork(void) fail("Fail"); fill_memory(p, page_size, 2 * page_size); - - if (wait_for_scan("Collapse PTE table with single page shared with parent process", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + c->collapse("Collapse PTE table with single page shared with parent process", + p, true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); @@ -860,7 +832,7 @@ static void collapse_fork(void) munmap(p, hpage_pmd_size); } -static void collapse_fork_compound(void) +static void collapse_fork_compound(struct collapse_context *c) { int wstatus; void *p; @@ -896,14 +868,10 @@ static void collapse_fork_compound(void) fill_memory(p, 0, page_size); write_num("khugepaged/max_ptes_shared", hpage_pmd_nr - 1); - if (wait_for_scan("Collapse PTE table full of compound pages in child", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + c->collapse("Collapse PTE table full of compound pages in child", + p, true); write_num("khugepaged/max_ptes_shared", - default_settings.khugepaged.max_ptes_shared); + default_settings.khugepaged.max_ptes_shared); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); @@ -922,7 +890,7 @@ static void collapse_fork_compound(void) munmap(p, hpage_pmd_size); } -static void collapse_max_ptes_shared() +static void collapse_max_ptes_shared(struct collapse_context *c) { int max_ptes_shared = read_num("khugepaged/max_ptes_shared"); int wstatus; @@ -957,28 +925,22 @@ static void collapse_max_ptes_shared() else fail("Fail"); - if (wait_for_scan("Do not collapse with max_ptes_shared exceeded", p)) - fail("Timeout"); - else if (!check_huge(p)) - success("OK"); - else - fail("Fail"); - - printf("Trigger CoW on page %d of %d...", - hpage_pmd_nr - max_ptes_shared, hpage_pmd_nr); - fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared) * page_size); - if (!check_huge(p)) - success("OK"); - else - fail("Fail"); - - - if (wait_for_scan("Collapse with max_ptes_shared PTEs shared", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + c->collapse("Maybe collapse with max_ptes_shared exceeded", p, + !c->enforce_pte_scan_limits); + + if (c->enforce_pte_scan_limits) { + printf("Trigger CoW on page %d of %d...", + hpage_pmd_nr - max_ptes_shared, hpage_pmd_nr); + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared) * + page_size); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + c->collapse("Collapse with max_ptes_shared PTEs shared", + p, true); + } validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); @@ -999,6 +961,8 @@ static void collapse_max_ptes_shared() int main(void) { + struct collapse_context c; + setbuf(stdout, NULL); page_size = getpagesize(); @@ -1014,18 +978,23 @@ int main(void) adjust_settings(); alloc_at_fault(); - collapse_full(); - collapse_empty(); - collapse_single_pte_entry(); - collapse_max_ptes_none(); - collapse_swapin_single_pte(); - collapse_max_ptes_swap(); - collapse_single_pte_entry_compound(); - collapse_full_of_compound(); - collapse_compound_extreme(); - collapse_fork(); - collapse_fork_compound(); - collapse_max_ptes_shared(); + + printf("\n*** Testing context: khugepaged ***\n"); + c.collapse = &khugepaged_collapse; + c.enforce_pte_scan_limits = true; + + collapse_full(&c); + collapse_empty(&c); + collapse_single_pte_entry(&c); + collapse_max_ptes_none(&c); + collapse_swapin_single_pte(&c); + collapse_max_ptes_swap(&c); + collapse_single_pte_entry_compound(&c); + collapse_full_of_compound(&c); + collapse_compound_extreme(&c); + collapse_fork(&c); + collapse_fork_compound(&c); + collapse_max_ptes_shared(&c); restore_settings(0); } From patchwork Wed Jul 6 23:59:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908939 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47FA0C43334 for ; Thu, 7 Jul 2022 00:06:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2E288E000F; Wed, 6 Jul 2022 20:06:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D8F358E0001; Wed, 6 Jul 2022 20:06:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B462A8E000F; Wed, 6 Jul 2022 20:06:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A10A38E0001 for ; Wed, 6 Jul 2022 20:06:32 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8365160CEA for ; Thu, 7 Jul 2022 00:06:32 +0000 (UTC) X-FDA: 79658362224.19.AF87118 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf06.hostedemail.com (Postfix) with ESMTP id 1CD7218000E for ; Thu, 7 Jul 2022 00:06:31 +0000 (UTC) Received: by mail-pg1-f201.google.com with SMTP id s129-20020a632c87000000b00411564fe1feso6557426pgs.7 for ; Wed, 06 Jul 2022 17:06:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=rUQXSYm+jwLulXNc/u4/sLSDtpiWmgOHIm4tG7hW3v0=; b=d1lQcdx8OZ2OxhmOwg2Axsg1/OEUIk8oLRj1qVUp/0fidnVfpnQag7fHY//2Gn7wvW c2a6KR6HVloMJ8opS/QnS6Bi6HRc5zEEaPxuKCojFPgrJp2cjBE6/bFbxLyvjQEexs3k IrmhGbzPpfSHF6IsC3byW/6MezFIbVHvOZm5IGeAJCGKfgzGrxjMcNJaCr/YDZmC7kUu LqFaj2vJAkab/PvrK3jOSpqZSfhOi79kZd43NWivmLRYK7NGZvbSeJQ23512hnim+q/r sjCB/uPISD01U+SfDBB66Bp1Cm8GnfFxmu4EtSFORQzqFeDEow0pgM+YbqAp/HHj0mOO d51g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=rUQXSYm+jwLulXNc/u4/sLSDtpiWmgOHIm4tG7hW3v0=; b=qPXi6ujAPI6wpSvh0SNUiuI5OZMGCBk2zMzwMkM3qTYauzzIwG9evvyftsaLIhCco+ T7hl0zqJilczg22mCl61zfclAzkuxWQimyP1n2KtvSKQA3/fJ8M9hnbHQ//t4hLCXvXo SIayQFOSGpBPYaqjdc4/WF+Cn1XKkDUN+CV8EP2rg+QURJ1Zu/ur3LY7EBRNyNdMubtY XNPh0g3HAa5bxGSeJ+BnXpnaZo1C9zniJ5gF4bG6IGGbIeiiR6nSN3Zmm5JjWPlM2YQ8 GDWgK/BqxN25SL+XeYiynBeCGWWnXNA7WrRFpTJT+dlgfmMQf065bUklxIGzIppUfwuu vXew== X-Gm-Message-State: AJIora8OcpyA45hQEcUBNJZmm4p9i6G0pvtmghdYXl33cLuBDQWaxJ1O QkRdrwY7+uWFEKd2KZYUUXi7fny4p+rg X-Google-Smtp-Source: AGRyM1u8hjmfP67ytE1ItwQF9iRCwnBmhk/QAz4phjlyVFncIS27jikmjqw9hFveUyXIaDYsCwIygQ7ISNO5 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90a:249:b0:1e0:a8a3:3c6c with SMTP id t9-20020a17090a024900b001e0a8a33c6cmr17116pje.0.1657152390778; Wed, 06 Jul 2022 17:06:30 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:33 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-16-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 15/18] selftests/vm: dedup hugepage allocation logic From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152392; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rUQXSYm+jwLulXNc/u4/sLSDtpiWmgOHIm4tG7hW3v0=; b=cuJ0Uq3fAVpfN0vkXyxzTdmbzIvHopcHbG29yJp0W6P9gou8HLnhljvq2ZXdkf/u2GxmtH +msfJ2i0VGR3FhFwU1hq3pOGRlnexLfH1pNrWiYyoXb1E5LFt4ObepesLfiy5G8hOedJXj 2U5PjhY6tZWqxJCtEI84OXgdT9CqA3E= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=d1lQcdx8; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf06.hostedemail.com: domain of 3hiPGYgcKCAM2rnhhihjrrjoh.frpolqx0-ppnydfn.ruj@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3hiPGYgcKCAM2rnhhihjrrjoh.frpolqx0-ppnydfn.ruj@flex--zokeefe.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152392; a=rsa-sha256; cv=none; b=sYCTK90E6F1VRd2meGs5gKHwz12574LBW1QhPwXoRceLmgD3excRc3TV1jRuX6XuNWQDDs vgJ3QCRBITe/SA0NwdiXPxw/Anisckikg+DkZOQp7T6ZpUzU/E7ZONvyQKAMcLtRPcjiun 5u7dEpraqhANYkZzWFfmWxwTn/HNLzo= X-Rspamd-Server: rspam11 X-Rspam-User: X-Stat-Signature: 6j9swfrqoowntcujxorektrenqudom59 X-Rspamd-Queue-Id: 1CD7218000E Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=d1lQcdx8; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf06.hostedemail.com: domain of 3hiPGYgcKCAM2rnhhihjrrjoh.frpolqx0-ppnydfn.ruj@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3hiPGYgcKCAM2rnhhihjrrjoh.frpolqx0-ppnydfn.ruj@flex--zokeefe.bounces.google.com X-HE-Tag: 1657152391-801562 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The code p = alloc_mapping(); printf("Allocate huge page..."); madvise(p, hpage_pmd_size, MADV_HUGEPAGE); fill_memory(p, 0, hpage_pmd_size); if (check_huge(p)) success("OK"); else fail("Fail"); Is repeated many times in different tests. Add a helper, alloc_hpage() to handle this. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 62 +++++++++---------------- 1 file changed, 23 insertions(+), 39 deletions(-) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index 0f1bee0eff24..eb6f5bbacff1 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -461,6 +461,25 @@ static void fill_memory(int *p, unsigned long start, unsigned long end) p[i * page_size / sizeof(*p)] = i + 0xdead0000; } +/* + * Returns pmd-mapped hugepage in VMA marked VM_HUGEPAGE, filled with + * validate_memory()'able contents. + */ +static void *alloc_hpage(void) +{ + void *p; + + p = alloc_mapping(); + printf("Allocate huge page..."); + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + fill_memory(p, 0, hpage_pmd_size); + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + return p; +} + static void validate_memory(int *p, unsigned long start, unsigned long end) { int i; @@ -682,15 +701,7 @@ static void collapse_single_pte_entry_compound(struct collapse_context *c) { void *p; - p = alloc_mapping(); - - printf("Allocate huge page..."); - madvise(p, hpage_pmd_size, MADV_HUGEPAGE); - fill_memory(p, 0, hpage_pmd_size); - if (check_huge(p)) - success("OK"); - else - fail("Fail"); + p = alloc_hpage(); madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); printf("Split huge page leaving single PTE mapping compound page..."); @@ -710,16 +721,7 @@ static void collapse_full_of_compound(struct collapse_context *c) { void *p; - p = alloc_mapping(); - - printf("Allocate huge page..."); - madvise(p, hpage_pmd_size, MADV_HUGEPAGE); - fill_memory(p, 0, hpage_pmd_size); - if (check_huge(p)) - success("OK"); - else - fail("Fail"); - + p = alloc_hpage(); printf("Split huge page leaving single PTE page table full of compound pages..."); madvise(p, page_size, MADV_NOHUGEPAGE); madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); @@ -837,16 +839,7 @@ static void collapse_fork_compound(struct collapse_context *c) int wstatus; void *p; - p = alloc_mapping(); - - printf("Allocate huge page..."); - madvise(p, hpage_pmd_size, MADV_HUGEPAGE); - fill_memory(p, 0, hpage_pmd_size); - if (check_huge(p)) - success("OK"); - else - fail("Fail"); - + p = alloc_hpage(); printf("Share huge page over fork()..."); if (!fork()) { /* Do not touch settings on child exit */ @@ -896,16 +889,7 @@ static void collapse_max_ptes_shared(struct collapse_context *c) int wstatus; void *p; - p = alloc_mapping(); - - printf("Allocate huge page..."); - madvise(p, hpage_pmd_size, MADV_HUGEPAGE); - fill_memory(p, 0, hpage_pmd_size); - if (check_huge(p)) - success("OK"); - else - fail("Fail"); - + p = alloc_hpage(); printf("Share huge page over fork()..."); if (!fork()) { /* Do not touch settings on child exit */ From patchwork Wed Jul 6 23:59:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908940 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15A10CCA47C for ; Thu, 7 Jul 2022 00:06:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C599D8E0010; Wed, 6 Jul 2022 20:06:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BBB6C8E0001; Wed, 6 Jul 2022 20:06:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C36F8E0010; Wed, 6 Jul 2022 20:06:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7CB0F8E0001 for ; Wed, 6 Jul 2022 20:06:34 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 5CB0860D38 for ; Thu, 7 Jul 2022 00:06:34 +0000 (UTC) X-FDA: 79658362308.17.D12BDD1 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf12.hostedemail.com (Postfix) with ESMTP id EB1CA40027 for ; Thu, 7 Jul 2022 00:06:33 +0000 (UTC) Received: by mail-pg1-f202.google.com with SMTP id 134-20020a63018c000000b0040cf04213a1so6558940pgb.6 for ; Wed, 06 Jul 2022 17:06:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Hiey0vm18yQIgWXVOVPxsJjlqlREjcvL3xoKJ3VX+0g=; b=YQVx953zx51KApOw5u3kmrz5sA6oaZqUgP2iQusAzjLlHMxzBKMV/R6arNvC3v+qEq JpCvfux5Bam07YWJrU9iNr6RJ1N1qeCYZ8AkheBuM9lZLPo4qQn2iofjMe9+8+zg5TjI gkt/myVagRYWgZqz18M5H/eleOpdCpgBC3bSNWFWE46aM9ENuJgedjibNyyq3drDyW3s xL6P3bSnGEt5cKh+lkaek7EVz1A2DpdJlZ3XuG6Mcdde6/hRn7xvmkQ1uUcBFijGJcR6 HjYNBUsS21F1W745q5qV98NI9XVMgqEojNsnucCho7q2yI5YgwB08HWrFzjsX41uqTUp MO6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Hiey0vm18yQIgWXVOVPxsJjlqlREjcvL3xoKJ3VX+0g=; b=OkNM22Jz22CFc2UqrQO9S79/3fQuK/cqTKRVx0KrSjXAggJPj5azEtPdfkTw7TKa2s OlLGrfFB6kJajAR1m3O3I15JGF0Q72BGkhlx6QuD8p2nPSTzEth9j7IAtK69/oQ2ibTZ /tsxXgvsGyZSdCnxGCio65/+wq+aJFDDZN24Gggtnt5kl0UVn5mKbqJljC+sBxZbBRno FKaJJ1GsLsuocB9VgHqaL2Zf6lAexlwdLMFjO2vzpgRzsQLmrYI7zlQJqZYaSWBf2DCX D4s61YaV5Bmiw82IKMEjPq/M9LqAZNIWxig9Dhr3RheaGKksQRtjqtmWhPWxhkeZyR9a W2Ow== X-Gm-Message-State: AJIora+40X6CYqeSwdP0E8Ryw+f3RFMUawV51YHpIAwQtCx6pcZbs2yL X0pY1R0E+voOnQxXlhi2ndyJcZzPjZqO X-Google-Smtp-Source: AGRyM1vg5mJBDnjdpqC0NBs9hAnB8OPV+l0Ix2SRKW7yl/IcsknhkjIlbMsw+EAr6TTl2rIpprAOfolIVSI1 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:d54b:b0:16b:f246:e32e with SMTP id z11-20020a170902d54b00b0016bf246e32emr11684045plf.4.1657152392989; Wed, 06 Jul 2022 17:06:32 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:34 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-17-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 16/18] selftests/vm: add MADV_COLLAPSE collapse context to selftests From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152394; a=rsa-sha256; cv=none; b=QAq7LI2zazRhm6mECy2tPNoYD16LaqYgVH0mdj3u9jSBWarHy2QfHJxjk0GE4sDE+U6wlg zIUDBgnVXFTw83yD0nQhL9JyCNPWomHxFE/0WrbjuNI5yVVX4l2up5X+1MjnZ5530P4QyP RI2kufRmZU8TLJSHdLNxihjEli84j5A= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=YQVx953z; spf=pass (imf12.hostedemail.com: domain of 3iCPGYgcKCAU4tpjjkjlttlqj.htrqnsz2-rrp0fhp.twl@flex--zokeefe.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3iCPGYgcKCAU4tpjjkjlttlqj.htrqnsz2-rrp0fhp.twl@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152394; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Hiey0vm18yQIgWXVOVPxsJjlqlREjcvL3xoKJ3VX+0g=; b=rTM1duiU5+DjVE0vGfeAsA/ewQu7KGY6CE1EyjahOMltSRwH6Z1th4ggZ/zgtl+UUmJa4V kQuKYe7+eUaw9I/vR1U3pDI/fDQcHHSsCy4ewgevSTy8fF+Fi42fdhwXsbEdkIW47G2YFs Mxd91qN8nTBxS8TX5cQ2eyUF47gGTTo= Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=YQVx953z; spf=pass (imf12.hostedemail.com: domain of 3iCPGYgcKCAU4tpjjkjlttlqj.htrqnsz2-rrp0fhp.twl@flex--zokeefe.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3iCPGYgcKCAU4tpjjkjlttlqj.htrqnsz2-rrp0fhp.twl@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 36andacyohkfuohhb7ugogjcy9fsj1wh X-Rspamd-Queue-Id: EB1CA40027 X-HE-Tag: 1657152393-455625 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add madvise collapse context to hugepage collapse selftests. This context is tested with /sys/kernel/mm/transparent_hugepage/enabled set to "never" in order to avoid unwanted interaction with khugepaged during testing. Also, refactor updates to sysfs THP settings using a stack so that the THP settings from nested callers can be restored. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 171 +++++++++++++++++------- 1 file changed, 125 insertions(+), 46 deletions(-) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index eb6f5bbacff1..780f04440e15 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -14,6 +14,9 @@ #ifndef MADV_PAGEOUT #define MADV_PAGEOUT 21 #endif +#ifndef MADV_COLLAPSE +#define MADV_COLLAPSE 25 +#endif #define BASE_ADDR ((void *)(1UL << 30)) static unsigned long hpage_pmd_size; @@ -95,18 +98,6 @@ struct settings { struct khugepaged_settings khugepaged; }; -static struct settings default_settings = { - .thp_enabled = THP_MADVISE, - .thp_defrag = THP_DEFRAG_ALWAYS, - .shmem_enabled = SHMEM_NEVER, - .use_zero_page = 0, - .khugepaged = { - .defrag = 1, - .alloc_sleep_millisecs = 10, - .scan_sleep_millisecs = 10, - }, -}; - static struct settings saved_settings; static bool skip_settings_restore; @@ -284,6 +275,39 @@ static void write_settings(struct settings *settings) write_num("khugepaged/pages_to_scan", khugepaged->pages_to_scan); } +#define MAX_SETTINGS_DEPTH 4 +static struct settings settings_stack[MAX_SETTINGS_DEPTH]; +static int settings_index; + +static struct settings *current_settings(void) +{ + if (!settings_index) { + printf("Fail: No settings set"); + exit(EXIT_FAILURE); + } + return settings_stack + settings_index - 1; +} + +static void push_settings(struct settings *settings) +{ + if (settings_index >= MAX_SETTINGS_DEPTH) { + printf("Fail: Settings stack exceeded"); + exit(EXIT_FAILURE); + } + settings_stack[settings_index++] = *settings; + write_settings(current_settings()); +} + +static void pop_settings(void) +{ + if (settings_index <= 0) { + printf("Fail: Settings stack empty"); + exit(EXIT_FAILURE); + } + --settings_index; + write_settings(current_settings()); +} + static void restore_settings(int sig) { if (skip_settings_restore) @@ -327,14 +351,6 @@ static void save_settings(void) signal(SIGQUIT, restore_settings); } -static void adjust_settings(void) -{ - - printf("Adjust settings..."); - write_settings(&default_settings); - success("OK"); -} - #define MAX_LINE_LENGTH 500 static bool check_for_pattern(FILE *fp, char *pattern, char *buf) @@ -493,6 +509,38 @@ static void validate_memory(int *p, unsigned long start, unsigned long end) } } +static void madvise_collapse(const char *msg, char *p, bool expect) +{ + int ret; + struct settings settings = *current_settings(); + + printf("%s...", msg); + /* Sanity check */ + if (check_huge(p)) { + printf("Unexpected huge page\n"); + exit(EXIT_FAILURE); + } + + /* + * Prevent khugepaged interference and tests that MADV_COLLAPSE + * ignores /sys/kernel/mm/transparent_hugepage/enabled + */ + settings.thp_enabled = THP_NEVER; + push_settings(&settings); + + /* Clear VM_NOHUGEPAGE */ + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + ret = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + if (((bool)ret) == expect) + fail("Fail: Bad return value"); + else if (check_huge(p) != expect) + fail("Fail: check_huge()"); + else + success("OK"); + + pop_settings(); +} + #define TICK 500000 static bool wait_for_scan(const char *msg, char *p) { @@ -542,11 +590,11 @@ static void khugepaged_collapse(const char *msg, char *p, bool expect) static void alloc_at_fault(void) { - struct settings settings = default_settings; + struct settings settings = *current_settings(); char *p; settings.thp_enabled = THP_ALWAYS; - write_settings(&settings); + push_settings(&settings); p = alloc_mapping(); *p = 1; @@ -556,7 +604,7 @@ static void alloc_at_fault(void) else fail("Fail"); - write_settings(&default_settings); + pop_settings(); madvise(p, page_size, MADV_DONTNEED); printf("Split huge PMD on MADV_DONTNEED..."); @@ -602,11 +650,11 @@ static void collapse_single_pte_entry(struct collapse_context *c) static void collapse_max_ptes_none(struct collapse_context *c) { int max_ptes_none = hpage_pmd_nr / 2; - struct settings settings = default_settings; + struct settings settings = *current_settings(); void *p; settings.khugepaged.max_ptes_none = max_ptes_none; - write_settings(&settings); + push_settings(&settings); p = alloc_mapping(); @@ -623,7 +671,7 @@ static void collapse_max_ptes_none(struct collapse_context *c) } munmap(p, hpage_pmd_size); - write_settings(&default_settings); + pop_settings(); } static void collapse_swapin_single_pte(struct collapse_context *c) @@ -703,7 +751,6 @@ static void collapse_single_pte_entry_compound(struct collapse_context *c) p = alloc_hpage(); madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); - printf("Split huge page leaving single PTE mapping compound page..."); madvise(p + page_size, hpage_pmd_size - page_size, MADV_DONTNEED); if (!check_huge(p)) @@ -864,7 +911,7 @@ static void collapse_fork_compound(struct collapse_context *c) c->collapse("Collapse PTE table full of compound pages in child", p, true); write_num("khugepaged/max_ptes_shared", - default_settings.khugepaged.max_ptes_shared); + current_settings()->khugepaged.max_ptes_shared); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); @@ -943,9 +990,21 @@ static void collapse_max_ptes_shared(struct collapse_context *c) munmap(p, hpage_pmd_size); } -int main(void) +int main(int argc, const char **argv) { struct collapse_context c; + struct settings default_settings = { + .thp_enabled = THP_MADVISE, + .thp_defrag = THP_DEFRAG_ALWAYS, + .shmem_enabled = SHMEM_NEVER, + .use_zero_page = 0, + .khugepaged = { + .defrag = 1, + .alloc_sleep_millisecs = 10, + .scan_sleep_millisecs = 10, + }, + }; + const char *tests = argc == 1 ? "all" : argv[1]; setbuf(stdout, NULL); @@ -959,26 +1018,46 @@ int main(void) default_settings.khugepaged.pages_to_scan = hpage_pmd_nr * 8; save_settings(); - adjust_settings(); + push_settings(&default_settings); alloc_at_fault(); - printf("\n*** Testing context: khugepaged ***\n"); - c.collapse = &khugepaged_collapse; - c.enforce_pte_scan_limits = true; - - collapse_full(&c); - collapse_empty(&c); - collapse_single_pte_entry(&c); - collapse_max_ptes_none(&c); - collapse_swapin_single_pte(&c); - collapse_max_ptes_swap(&c); - collapse_single_pte_entry_compound(&c); - collapse_full_of_compound(&c); - collapse_compound_extreme(&c); - collapse_fork(&c); - collapse_fork_compound(&c); - collapse_max_ptes_shared(&c); + if (!strcmp(tests, "khugepaged") || !strcmp(tests, "all")) { + printf("\n*** Testing context: khugepaged ***\n"); + c.collapse = &khugepaged_collapse; + c.enforce_pte_scan_limits = true; + + collapse_full(&c); + collapse_empty(&c); + collapse_single_pte_entry(&c); + collapse_max_ptes_none(&c); + collapse_swapin_single_pte(&c); + collapse_max_ptes_swap(&c); + collapse_single_pte_entry_compound(&c); + collapse_full_of_compound(&c); + collapse_compound_extreme(&c); + collapse_fork(&c); + collapse_fork_compound(&c); + collapse_max_ptes_shared(&c); + } + if (!strcmp(tests, "madvise") || !strcmp(tests, "all")) { + printf("\n*** Testing context: madvise ***\n"); + c.collapse = &madvise_collapse; + c.enforce_pte_scan_limits = false; + + collapse_full(&c); + collapse_empty(&c); + collapse_single_pte_entry(&c); + collapse_max_ptes_none(&c); + collapse_swapin_single_pte(&c); + collapse_max_ptes_swap(&c); + collapse_single_pte_entry_compound(&c); + collapse_full_of_compound(&c); + collapse_compound_extreme(&c); + collapse_fork(&c); + collapse_fork_compound(&c); + collapse_max_ptes_shared(&c); + } restore_settings(0); } From patchwork Wed Jul 6 23:59:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908941 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCA50C433EF for ; Thu, 7 Jul 2022 00:06:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 443508E0011; Wed, 6 Jul 2022 20:06:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F24E8E0001; Wed, 6 Jul 2022 20:06:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 270ED8E0011; Wed, 6 Jul 2022 20:06:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 148038E0001 for ; Wed, 6 Jul 2022 20:06:36 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id CBC6320460 for ; Thu, 7 Jul 2022 00:06:35 +0000 (UTC) X-FDA: 79658362350.08.6A4C5F6 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf03.hostedemail.com (Postfix) with ESMTP id 78AEB20010 for ; Thu, 7 Jul 2022 00:06:35 +0000 (UTC) Received: by mail-pf1-f201.google.com with SMTP id cg26-20020a056a00291a00b005289a87cf4fso2230824pfb.2 for ; Wed, 06 Jul 2022 17:06:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=CkYk88TkdwUVlbznBTqhJvh2vP3Ui+TXQqwGrv5weWM=; b=L2z2W4G8Tvf1xwpIsOVP/KNawFmg7qhnUzoOAR+5N5fa0MF27ecN+9JaKkDnmfcSPC vgeXoPhdv+LUJpCGO+WUK8byYAMczDbEL4arZ98wDcXF4NPDEkBsiX9trpIsmw4Ebyym jrgZnPoHJzpSnr2tE3AKdPtOIsy+7REOxB7JaHd5yOXa9P4zpnGd9s25q3bAuQZ1PdpX o4R9eU4bbRsGUujLv4m2Oj/phNPX1U2bU/JP08tnFn+hEP6WB0obN48Ln8ANC9SREi/w PQadab/db+lsKHtJsiSIbv3XfsU85dBornTGXQO4WvL1hmCrESm6VS+6s9rfTMZ0a/wD 7KkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=CkYk88TkdwUVlbznBTqhJvh2vP3Ui+TXQqwGrv5weWM=; b=2jGHMIHrnkBjTexBILMpj630r9ZHdSAm3jqx0wyHZSMDvQyDu8TjkMsKe13NyLBLgA 2q24pe2HNN4QnAWeqvXTg+gZqe85sBlh+Nzbz5fhmQkN1R8kMHLvlWq8kTqAGJneda7H /B2uf0+HDAsHEgjw7SR4X/XLYQvIIbIyWgW20CRg6QKPskm0LmJ9aNkD1UhiXLBaTsn0 gPRFrUsjjiSfots/M8t6m4498gj1silQZSMZ3kwRevtBC1esjf8UFjkzTzuUH0j7mb43 Fb44LeBjW4jO2Ha8W60gxMZRz64+4F1Zwfw1IhidR6qHZ02pcwvRLl0ho2V+xxuWXDOf OVDw== X-Gm-Message-State: AJIora/Qo5A7bkfYHmJBcH86ieOoUbf9d91XrF7XdTjXMD3DzLyrul9Z rqZ6uyr5pewDHKA0rrNrVX1yfGypz/C6 X-Google-Smtp-Source: AGRyM1ujo8OlX2m/+C13hn1dTMSnW4X6QIt98hI5Hiw+/pDFYlrDVXi6TFPKymKWht4bGzxbtEW9gUifUGpJ X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:d2c8:b0:16c:58d:7278 with SMTP id n8-20020a170902d2c800b0016c058d7278mr4462532plc.45.1657152394609; Wed, 06 Jul 2022 17:06:34 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:35 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-18-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 17/18] selftests/vm: add selftest to verify recollapse of THPs From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152395; a=rsa-sha256; cv=none; b=32c1lImx7HyXtKLISF42c2XPokCGFiNrOrRlsYfEquwF7jHXAo+sxzzGZym5qgf24IgUxs TcSZzvgE1QXIv3O5P2oduj6R3QjSxVWjtSExFaZhMyRfCj+kD+aTr86vDFcbyjyVm7Moof l2aHE7p3CAvzaJ87xk5QaRsyzZNgLqo= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=L2z2W4G8; spf=pass (imf03.hostedemail.com: domain of 3iiPGYgcKCAc6vrllmlnvvnsl.jvtspu14-ttr2hjr.vyn@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3iiPGYgcKCAc6vrllmlnvvnsl.jvtspu14-ttr2hjr.vyn@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152395; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CkYk88TkdwUVlbznBTqhJvh2vP3Ui+TXQqwGrv5weWM=; b=3rwmhYeKxashYdEa+bxYuVBOQYqRVMMbSzPxcd3OcsqtZDko87cYFVTGVxKb+Je+De8ld1 eBPQa5e0gKKj1KBghz1CfzeD/NHtv2Z2FbPUjA5uonjDhUozhjwe0vMcq7DHWmZqDMjog/ +N3bOluIdtxCdV541FI+m9/Diy3katg= Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=L2z2W4G8; spf=pass (imf03.hostedemail.com: domain of 3iiPGYgcKCAc6vrllmlnvvnsl.jvtspu14-ttr2hjr.vyn@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3iiPGYgcKCAc6vrllmlnvvnsl.jvtspu14-ttr2hjr.vyn@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: q4yikiifper4a9w9hq1mzga4z9x5y4f7 X-Rspamd-Queue-Id: 78AEB20010 X-HE-Tag: 1657152395-765245 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add selftest specific to madvise collapse context that tests MADV_COLLAPSE is "successful" if a hugepage-aligned/sized region is already pmd-mapped. This test also verifies that MADV_COLLAPSE can collapse memory into THPs even in "madvise" THP mode and the memory isn't marked VM_HUGEPAGE. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 31 +++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index 780f04440e15..87cd0b99477f 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -990,6 +990,36 @@ static void collapse_max_ptes_shared(struct collapse_context *c) munmap(p, hpage_pmd_size); } +static void madvise_collapse_existing_thps(void) +{ + void *p; + int err; + + p = alloc_mapping(); + fill_memory(p, 0, hpage_pmd_size); + + printf("Collapse fully populated PTE table..."); + /* + * Note that we don't set MADV_HUGEPAGE here, which + * also tests that VM_HUGEPAGE isn't required for + * MADV_COLLAPSE in "madvise" mode. + */ + err = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + if (err == 0 && check_huge(p)) { + success("OK"); + printf("Re-collapse PMD-mapped hugepage"); + err = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + if (err == 0 && check_huge(p)) + success("OK"); + else + fail("Fail"); + } else { + fail("Fail"); + } + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); +} + int main(int argc, const char **argv) { struct collapse_context c; @@ -1057,6 +1087,7 @@ int main(int argc, const char **argv) collapse_fork(&c); collapse_fork_compound(&c); collapse_max_ptes_shared(&c); + madvise_collapse_existing_thps(); } restore_settings(0); From patchwork Wed Jul 6 23:59:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908942 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F807C43334 for ; Thu, 7 Jul 2022 00:06:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E087C8E0012; Wed, 6 Jul 2022 20:06:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D8F9D8E0001; Wed, 6 Jul 2022 20:06:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBACA8E0012; Wed, 6 Jul 2022 20:06:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9786D8E0001 for ; Wed, 6 Jul 2022 20:06:37 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 6CC6E20571 for ; Thu, 7 Jul 2022 00:06:37 +0000 (UTC) X-FDA: 79658362434.19.BF7FD0A Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf24.hostedemail.com (Postfix) with ESMTP id 0A700180020 for ; Thu, 7 Jul 2022 00:06:36 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id c7-20020a258807000000b0066d6839741eso12819702ybl.23 for ; Wed, 06 Jul 2022 17:06:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=8KfMuM8u5d4PKeLvqoetJBRLBRkfnd2hcAFGrBKYklY=; b=Lf6P5UBgGiKcnY1QeADBzC+JFYgd9+nDxA2/OXmgg9uQao7vPTClfNFMJK+0rwDQE2 FYx0dvw9VIrc32CoxUY7dfmOk3m8SCXvqatD9tVYgwNnuACdjp2Pi5itLLeCwwAorK/6 jbBKi961kaiADIloAjotzYYBGIIGHm+0HA2MOoad72sgQloPPLK/PBfWCvDFEaEUE4Kp lEYkgPrBIwQDoWXZC4S+guw8lzsYEljdF0GK4FzSjZwKHtarrZx6yKyIFim8XI9Pa8Y2 8OsiwyKCB08863vGLsKRZ079la+cLf/1rJtOflRvc8NK7Jj41VGad8JSCVz/FXDSJS0s 15Wg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=8KfMuM8u5d4PKeLvqoetJBRLBRkfnd2hcAFGrBKYklY=; b=wmiJDjj17Yz1EZ0Keur3WPYBVt/uMegNUZsR5ibd7zL9fzhS8mbc7qWM5py8+3cY2W pxBiECj8uVzua9znSRm4LXLR6x48LTFgGiAcCux+91Eqmp03C1WBapJU0TuanRxWfCi4 +EkIHrUFwDVGTD3GrE25EgzJumTHUj0UayOyJOT0/Q6rVWTn6k5+JehC0dAxm89nrTXN q2zIeI2X+wIrCTAQ/ZZdhSYIoUMr+1lDxnPGp6YdYZJtdMmlH593XnMZ1S5/IU+Z1wGc sdSJrycNYwYsIzzZAiLfp76jii/mgBB2yCLNFSOvlSdDbGRrAx3U/RTYQfgFCrQU8tQS 4nKw== X-Gm-Message-State: AJIora9PnNRbzJsmEqaABuKdPWpNewpaYC/EtPJbRK2mZNF/oCQjWbtO FEXdEFmKPb/b57UCkiOE5JseuAoRUegN X-Google-Smtp-Source: AGRyM1sI1K0BLLHjZyQSrZwGeH1ym/w4aX4tzvuS7Pg43HDySK2GkV3Vq8wGtSNJdzL1kRdT0uvWjRowqmKG X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a81:13cc:0:b0:31c:ad64:352c with SMTP id 195-20020a8113cc000000b0031cad64352cmr17718720ywt.185.1657152396079; Wed, 06 Jul 2022 17:06:36 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:36 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-19-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 18/18] selftests/vm: add selftest to verify multi THP collapse From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152397; a=rsa-sha256; cv=none; b=hWSHPCw0jor/NNxaRP63hyq2SFTYUoHQAOgWuCWlpOwvjjsTJov8y9LkY6Mg4nnwS8NjUH KFhq071uX8P5UxQvhnqaTEKUduBRBJnZWhi1x3zWzZ8jQPx2xrm24jTPzPH1FYafX3uH1v GZw97WJ3CPCHkR+sISZGOBjwAp4IRIs= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Lf6P5UBg; spf=pass (imf24.hostedemail.com: domain of 3jCPGYgcKCAk8xtnnonpxxpun.lxvurw36-vvt4jlt.x0p@flex--zokeefe.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3jCPGYgcKCAk8xtnnonpxxpun.lxvurw36-vvt4jlt.x0p@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152397; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8KfMuM8u5d4PKeLvqoetJBRLBRkfnd2hcAFGrBKYklY=; b=JIAPqVN+Oz9s/dCwAEzibB4/m7gc5GD+43rm5cddC5NgPhCHrVBpfde0OQXG1lQ4C+iB5q wEysL5ehrFtFh4EroqR8d/IFN4i4++KWLb9f7fCZEseE2dEmYyiaNSf6rwazvMbs1kd6Js +k9PUn72Kp1UPnWnkCmoauVHUIJMhb4= Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Lf6P5UBg; spf=pass (imf24.hostedemail.com: domain of 3jCPGYgcKCAk8xtnnonpxxpun.lxvurw36-vvt4jlt.x0p@flex--zokeefe.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3jCPGYgcKCAk8xtnnonpxxpun.lxvurw36-vvt4jlt.x0p@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: gtwbdxa35kfs9rokxngde79fexize4p8 X-Rspamd-Queue-Id: 0A700180020 X-HE-Tag: 1657152396-787767 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add support to allocate and verify collapse of multiple hugepage-sized regions into multiple THPs. Add "nr" argument to check_huge() that instructs check_huge() to check for exactly "nr_hpages" THPs. This has the added benefit of now being able to check for exactly 0 THPs, and so callsites that previously checked the negation of exactly 1 THP are now more correct. ->collapse struct collapse_context hook has been expanded with a "nr_hpages" argument to collapse "nr_hpages" hugepages. The collapse_full() test has been repurposed to collapse 4 THPs at once. It is expected more tests will want to test multi THP collapse (e.g. file/shmem). This is of particular benefit to madvise collapse context given that it may do many THP collapses during a single syscall. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 140 ++++++++++++------------ 1 file changed, 73 insertions(+), 67 deletions(-) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index 87cd0b99477f..b77b1e28cdb3 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -27,7 +27,7 @@ static int hpage_pmd_nr; #define PID_SMAPS "/proc/self/smaps" struct collapse_context { - void (*collapse)(const char *msg, char *p, bool expect); + void (*collapse)(const char *msg, char *p, int nr_hpages, bool expect); bool enforce_pte_scan_limits; }; @@ -362,7 +362,7 @@ static bool check_for_pattern(FILE *fp, char *pattern, char *buf) return false; } -static bool check_huge(void *addr) +static bool check_huge(void *addr, int nr_hpages) { bool thp = false; int ret; @@ -387,7 +387,7 @@ static bool check_huge(void *addr) goto err_out; ret = snprintf(addr_pattern, MAX_LINE_LENGTH, "AnonHugePages:%10ld kB", - hpage_pmd_size >> 10); + nr_hpages * (hpage_pmd_size >> 10)); if (ret >= MAX_LINE_LENGTH) { printf("%s: Pattern is too long\n", __func__); exit(EXIT_FAILURE); @@ -455,12 +455,12 @@ static bool check_swap(void *addr, unsigned long size) return swap; } -static void *alloc_mapping(void) +static void *alloc_mapping(int nr) { void *p; - p = mmap(BASE_ADDR, hpage_pmd_size, PROT_READ | PROT_WRITE, - MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + p = mmap(BASE_ADDR, nr * hpage_pmd_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); if (p != BASE_ADDR) { printf("Failed to allocate VMA at %p\n", BASE_ADDR); exit(EXIT_FAILURE); @@ -485,11 +485,11 @@ static void *alloc_hpage(void) { void *p; - p = alloc_mapping(); + p = alloc_mapping(1); printf("Allocate huge page..."); madvise(p, hpage_pmd_size, MADV_HUGEPAGE); fill_memory(p, 0, hpage_pmd_size); - if (check_huge(p)) + if (check_huge(p, 1)) success("OK"); else fail("Fail"); @@ -509,14 +509,15 @@ static void validate_memory(int *p, unsigned long start, unsigned long end) } } -static void madvise_collapse(const char *msg, char *p, bool expect) +static void madvise_collapse(const char *msg, char *p, int nr_hpages, + bool expect) { int ret; struct settings settings = *current_settings(); printf("%s...", msg); /* Sanity check */ - if (check_huge(p)) { + if (!check_huge(p, 0)) { printf("Unexpected huge page\n"); exit(EXIT_FAILURE); } @@ -529,11 +530,11 @@ static void madvise_collapse(const char *msg, char *p, bool expect) push_settings(&settings); /* Clear VM_NOHUGEPAGE */ - madvise(p, hpage_pmd_size, MADV_HUGEPAGE); - ret = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + madvise(p, nr_hpages * hpage_pmd_size, MADV_HUGEPAGE); + ret = madvise(p, nr_hpages * hpage_pmd_size, MADV_COLLAPSE); if (((bool)ret) == expect) fail("Fail: Bad return value"); - else if (check_huge(p) != expect) + else if (check_huge(p, nr_hpages) != expect) fail("Fail: check_huge()"); else success("OK"); @@ -542,25 +543,25 @@ static void madvise_collapse(const char *msg, char *p, bool expect) } #define TICK 500000 -static bool wait_for_scan(const char *msg, char *p) +static bool wait_for_scan(const char *msg, char *p, int nr_hpages) { int full_scans; int timeout = 6; /* 3 seconds */ /* Sanity check */ - if (check_huge(p)) { + if (!check_huge(p, 0)) { printf("Unexpected huge page\n"); exit(EXIT_FAILURE); } - madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + madvise(p, nr_hpages * hpage_pmd_size, MADV_HUGEPAGE); /* Wait until the second full_scan completed */ full_scans = read_num("khugepaged/full_scans") + 2; printf("%s...", msg); while (timeout--) { - if (check_huge(p)) + if (check_huge(p, nr_hpages)) break; if (read_num("khugepaged/full_scans") >= full_scans) break; @@ -568,20 +569,21 @@ static bool wait_for_scan(const char *msg, char *p) usleep(TICK); } - madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); + madvise(p, nr_hpages * hpage_pmd_size, MADV_NOHUGEPAGE); return timeout == -1; } -static void khugepaged_collapse(const char *msg, char *p, bool expect) +static void khugepaged_collapse(const char *msg, char *p, int nr_hpages, + bool expect) { - if (wait_for_scan(msg, p)) { + if (wait_for_scan(msg, p, nr_hpages)) { if (expect) fail("Timeout"); else success("OK"); return; - } else if (check_huge(p) == expect) { + } else if (check_huge(p, nr_hpages) == expect) { success("OK"); } else { fail("Fail"); @@ -596,10 +598,10 @@ static void alloc_at_fault(void) settings.thp_enabled = THP_ALWAYS; push_settings(&settings); - p = alloc_mapping(); + p = alloc_mapping(1); *p = 1; printf("Allocate huge page on fault..."); - if (check_huge(p)) + if (check_huge(p, 1)) success("OK"); else fail("Fail"); @@ -608,7 +610,7 @@ static void alloc_at_fault(void) madvise(p, page_size, MADV_DONTNEED); printf("Split huge PMD on MADV_DONTNEED..."); - if (!check_huge(p)) + if (check_huge(p, 0)) success("OK"); else fail("Fail"); @@ -618,20 +620,23 @@ static void alloc_at_fault(void) static void collapse_full(struct collapse_context *c) { void *p; + int nr_hpages = 4; + unsigned long size = nr_hpages * hpage_pmd_size; - p = alloc_mapping(); - fill_memory(p, 0, hpage_pmd_size); - c->collapse("Collapse fully populated PTE table", p, true); - validate_memory(p, 0, hpage_pmd_size); - munmap(p, hpage_pmd_size); + p = alloc_mapping(nr_hpages); + fill_memory(p, 0, size); + c->collapse("Collapse multiple fully populated PTE table", p, nr_hpages, + true); + validate_memory(p, 0, size); + munmap(p, size); } static void collapse_empty(struct collapse_context *c) { void *p; - p = alloc_mapping(); - c->collapse("Do not collapse empty PTE table", p, false); + p = alloc_mapping(1); + c->collapse("Do not collapse empty PTE table", p, 1, false); munmap(p, hpage_pmd_size); } @@ -639,10 +644,10 @@ static void collapse_single_pte_entry(struct collapse_context *c) { void *p; - p = alloc_mapping(); + p = alloc_mapping(1); fill_memory(p, 0, page_size); c->collapse("Collapse PTE table with single PTE entry present", p, - true); + 1, true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); } @@ -656,16 +661,17 @@ static void collapse_max_ptes_none(struct collapse_context *c) settings.khugepaged.max_ptes_none = max_ptes_none; push_settings(&settings); - p = alloc_mapping(); + p = alloc_mapping(1); fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); - c->collapse("Maybe collapse with max_ptes_none exceeded", p, + c->collapse("Maybe collapse with max_ptes_none exceeded", p, 1, !c->enforce_pte_scan_limits); validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); if (c->enforce_pte_scan_limits) { fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); - c->collapse("Collapse with max_ptes_none PTEs empty", p, true); + c->collapse("Collapse with max_ptes_none PTEs empty", p, 1, + true); validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); } @@ -677,7 +683,7 @@ static void collapse_max_ptes_none(struct collapse_context *c) static void collapse_swapin_single_pte(struct collapse_context *c) { void *p; - p = alloc_mapping(); + p = alloc_mapping(1); fill_memory(p, 0, hpage_pmd_size); printf("Swapout one page..."); @@ -692,7 +698,7 @@ static void collapse_swapin_single_pte(struct collapse_context *c) goto out; } - c->collapse("Collapse with swapping in single PTE entry", p, true); + c->collapse("Collapse with swapping in single PTE entry", p, 1, true); validate_memory(p, 0, hpage_pmd_size); out: munmap(p, hpage_pmd_size); @@ -703,7 +709,7 @@ static void collapse_max_ptes_swap(struct collapse_context *c) int max_ptes_swap = read_num("khugepaged/max_ptes_swap"); void *p; - p = alloc_mapping(); + p = alloc_mapping(1); fill_memory(p, 0, hpage_pmd_size); printf("Swapout %d of %d pages...", max_ptes_swap + 1, hpage_pmd_nr); @@ -718,7 +724,7 @@ static void collapse_max_ptes_swap(struct collapse_context *c) goto out; } - c->collapse("Maybe collapse with max_ptes_swap exceeded", p, + c->collapse("Maybe collapse with max_ptes_swap exceeded", p, 1, !c->enforce_pte_scan_limits); validate_memory(p, 0, hpage_pmd_size); @@ -738,7 +744,7 @@ static void collapse_max_ptes_swap(struct collapse_context *c) } c->collapse("Collapse with max_ptes_swap pages swapped out", p, - true); + 1, true); validate_memory(p, 0, hpage_pmd_size); } out: @@ -753,13 +759,13 @@ static void collapse_single_pte_entry_compound(struct collapse_context *c) madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); printf("Split huge page leaving single PTE mapping compound page..."); madvise(p + page_size, hpage_pmd_size - page_size, MADV_DONTNEED); - if (!check_huge(p)) + if (check_huge(p, 0)) success("OK"); else fail("Fail"); c->collapse("Collapse PTE table with single PTE mapping compound page", - p, true); + p, 1, true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); } @@ -772,12 +778,12 @@ static void collapse_full_of_compound(struct collapse_context *c) printf("Split huge page leaving single PTE page table full of compound pages..."); madvise(p, page_size, MADV_NOHUGEPAGE); madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); - if (!check_huge(p)) + if (check_huge(p, 0)) success("OK"); else fail("Fail"); - c->collapse("Collapse PTE table full of compound pages", p, true); + c->collapse("Collapse PTE table full of compound pages", p, 1, true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } @@ -787,14 +793,14 @@ static void collapse_compound_extreme(struct collapse_context *c) void *p; int i; - p = alloc_mapping(); + p = alloc_mapping(1); for (i = 0; i < hpage_pmd_nr; i++) { printf("\rConstruct PTE page table full of different PTE-mapped compound pages %3d/%d...", i + 1, hpage_pmd_nr); madvise(BASE_ADDR, hpage_pmd_size, MADV_HUGEPAGE); fill_memory(BASE_ADDR, 0, hpage_pmd_size); - if (!check_huge(BASE_ADDR)) { + if (!check_huge(BASE_ADDR, 1)) { printf("Failed to allocate huge page\n"); exit(EXIT_FAILURE); } @@ -823,12 +829,12 @@ static void collapse_compound_extreme(struct collapse_context *c) munmap(BASE_ADDR, hpage_pmd_size); fill_memory(p, 0, hpage_pmd_size); - if (!check_huge(p)) + if (check_huge(p, 0)) success("OK"); else fail("Fail"); - c->collapse("Collapse PTE table full of different compound pages", p, + c->collapse("Collapse PTE table full of different compound pages", p, 1, true); validate_memory(p, 0, hpage_pmd_size); @@ -840,11 +846,11 @@ static void collapse_fork(struct collapse_context *c) int wstatus; void *p; - p = alloc_mapping(); + p = alloc_mapping(1); printf("Allocate small page..."); fill_memory(p, 0, page_size); - if (!check_huge(p)) + if (check_huge(p, 0)) success("OK"); else fail("Fail"); @@ -855,14 +861,14 @@ static void collapse_fork(struct collapse_context *c) skip_settings_restore = true; exit_status = 0; - if (!check_huge(p)) + if (check_huge(p, 0)) success("OK"); else fail("Fail"); fill_memory(p, page_size, 2 * page_size); c->collapse("Collapse PTE table with single page shared with parent process", - p, true); + p, 1, true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); @@ -873,7 +879,7 @@ static void collapse_fork(struct collapse_context *c) exit_status += WEXITSTATUS(wstatus); printf("Check if parent still has small page..."); - if (!check_huge(p)) + if (check_huge(p, 0)) success("OK"); else fail("Fail"); @@ -893,7 +899,7 @@ static void collapse_fork_compound(struct collapse_context *c) skip_settings_restore = true; exit_status = 0; - if (check_huge(p)) + if (check_huge(p, 1)) success("OK"); else fail("Fail"); @@ -901,7 +907,7 @@ static void collapse_fork_compound(struct collapse_context *c) printf("Split huge page PMD in child process..."); madvise(p, page_size, MADV_NOHUGEPAGE); madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); - if (!check_huge(p)) + if (check_huge(p, 0)) success("OK"); else fail("Fail"); @@ -909,7 +915,7 @@ static void collapse_fork_compound(struct collapse_context *c) write_num("khugepaged/max_ptes_shared", hpage_pmd_nr - 1); c->collapse("Collapse PTE table full of compound pages in child", - p, true); + p, 1, true); write_num("khugepaged/max_ptes_shared", current_settings()->khugepaged.max_ptes_shared); @@ -922,7 +928,7 @@ static void collapse_fork_compound(struct collapse_context *c) exit_status += WEXITSTATUS(wstatus); printf("Check if parent still has huge page..."); - if (check_huge(p)) + if (check_huge(p, 1)) success("OK"); else fail("Fail"); @@ -943,7 +949,7 @@ static void collapse_max_ptes_shared(struct collapse_context *c) skip_settings_restore = true; exit_status = 0; - if (check_huge(p)) + if (check_huge(p, 1)) success("OK"); else fail("Fail"); @@ -951,26 +957,26 @@ static void collapse_max_ptes_shared(struct collapse_context *c) printf("Trigger CoW on page %d of %d...", hpage_pmd_nr - max_ptes_shared - 1, hpage_pmd_nr); fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared - 1) * page_size); - if (!check_huge(p)) + if (check_huge(p, 0)) success("OK"); else fail("Fail"); c->collapse("Maybe collapse with max_ptes_shared exceeded", p, - !c->enforce_pte_scan_limits); + 1, !c->enforce_pte_scan_limits); if (c->enforce_pte_scan_limits) { printf("Trigger CoW on page %d of %d...", hpage_pmd_nr - max_ptes_shared, hpage_pmd_nr); fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared) * page_size); - if (!check_huge(p)) + if (check_huge(p, 0)) success("OK"); else fail("Fail"); c->collapse("Collapse with max_ptes_shared PTEs shared", - p, true); + p, 1, true); } validate_memory(p, 0, hpage_pmd_size); @@ -982,7 +988,7 @@ static void collapse_max_ptes_shared(struct collapse_context *c) exit_status += WEXITSTATUS(wstatus); printf("Check if parent still has huge page..."); - if (check_huge(p)) + if (check_huge(p, 1)) success("OK"); else fail("Fail"); @@ -995,7 +1001,7 @@ static void madvise_collapse_existing_thps(void) void *p; int err; - p = alloc_mapping(); + p = alloc_mapping(1); fill_memory(p, 0, hpage_pmd_size); printf("Collapse fully populated PTE table..."); @@ -1005,11 +1011,11 @@ static void madvise_collapse_existing_thps(void) * MADV_COLLAPSE in "madvise" mode. */ err = madvise(p, hpage_pmd_size, MADV_COLLAPSE); - if (err == 0 && check_huge(p)) { + if (err == 0 && check_huge(p, 1)) { success("OK"); printf("Re-collapse PMD-mapped hugepage"); err = madvise(p, hpage_pmd_size, MADV_COLLAPSE); - if (err == 0 && check_huge(p)) + if (err == 0 && check_huge(p, 1)) success("OK"); else fail("Fail");