From patchwork Wed Jul 6 23:59:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12908926 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2A6BCCA480 for ; Thu, 7 Jul 2022 00:06:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BBE1D6B0074; Wed, 6 Jul 2022 20:06:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AA9A88E0001; Wed, 6 Jul 2022 20:06:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 810AE6B0078; Wed, 6 Jul 2022 20:06:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 680D56B0074 for ; Wed, 6 Jul 2022 20:06:08 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3F15334976 for ; Thu, 7 Jul 2022 00:06:08 +0000 (UTC) X-FDA: 79658361216.22.AAEF8E2 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf30.hostedemail.com (Postfix) with ESMTP id DB9A98005B for ; Thu, 7 Jul 2022 00:06:07 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id k21-20020aa78215000000b005283ff3d0c3so4272693pfi.1 for ; Wed, 06 Jul 2022 17:06:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=FO/Y8vmDPttuJQWcLWwVwXxjBqELpqfb92fkkENpOFo=; b=cXmiHuo93aqDVHARyXXaCGvMImaW9/ZiCgyDL+/aN/Gml9sZ/Bl/m4E5U89gH8vXo9 cQ9QhGxeR3IWGGu/JsVaVTrqqgYQ8GSLIKkqQ8QyEUCRT3JlsDnvj9FFT975JSeKPxWv 6AyjY/zX3TwN/LYHJ8KY/J7dGFzYoYwJAWbmTt7AmTrPN+iOdUVi/Cy9pbjaOroRExBp 7Gy1sBXxTMkSF3uwszku0hFwlWCsi5zKGdM1F4oC3Ko/pF44r+KxBa79Q5mkIMOsuhVr LzF1XY23wOnD0g0dHuFhiYE85QpeZPkBKRw9xsxOJpQD0iQ3IpuJ6/yl0jQtIn7MhTYX sf8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=FO/Y8vmDPttuJQWcLWwVwXxjBqELpqfb92fkkENpOFo=; b=QjLkOMFTsDY2SJpsf/hNpXLKqlYb3AOIUXcc9YYMAAG+AaKNB/BFfJv+Wxi61xSDcc nG9KBqAQZ7tKxkUdtLwtXM6Rcb0AI0UFlBQOLBsxam/m5/twWszuUXGijUSP2MhmZJTN OdW4+fVJ3EBULP0UZUdiCWxNRFlMgDGjGrrkxhgLvt2B82OMTYU46am+mIbzO0vFl1MU Aq28UmL2qHTnUQ8kWoc3PLN7s9PD291sBTGKTGrczcMyRwm7BI/iMEKitGwPDaZMBVLT +DNiK8x/0HMNQ6Cq6uipWKAML+jSUvz3WXU8Eos8C1lxb+gfaJa37B3ZqS0bjUQJBGJT 9sow== X-Gm-Message-State: AJIora8LE+0y26PdHs+P8ADiZJxHwK+/4s1nsK/ZJwm6XKjJYELxmuKi oN6AEfL/KZOYYz7hI5zfdqtcW2Rgybz6 X-Google-Smtp-Source: AGRyM1tYwJYMJF+0N+rveILbhklujt2+i2O22F8N+inDBk36ANBbyP/uf37xkRoAzKCYU2STpbIktaSee/AJ X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:aa7:9d11:0:b0:528:3d4e:f8d7 with SMTP id k17-20020aa79d11000000b005283d4ef8d7mr29825911pfp.19.1657152367190; Wed, 06 Jul 2022 17:06:07 -0700 (PDT) Date: Wed, 6 Jul 2022 16:59:20 -0700 In-Reply-To: <20220706235936.2197195-1-zokeefe@google.com> Message-Id: <20220706235936.2197195-3-zokeefe@google.com> Mime-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [mm-unstable v7 02/18] mm: khugepaged: don't carry huge page to the next loop for !CONFIG_NUMA From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657152367; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FO/Y8vmDPttuJQWcLWwVwXxjBqELpqfb92fkkENpOFo=; b=RYIeldRBM2lTn7uKZ56M0HZzCTgrVhwp0TgqdGaeYXz63BKuPjWpWwqfTzoVWdjmI7WOfI Guw/CqpbN/KiHj/0wLIPjp93Zo6kUnUd9SIlOvwLeqaJX4fUGO3oUtNxbXgKT4Itwy8nPE RLkwNhpUYALKGUE+4nEkltoET7bTDpI= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=cXmiHuo9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 3byPGYgcKCOolaWQQRQSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3byPGYgcKCOolaWQQRQSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--zokeefe.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657152367; a=rsa-sha256; cv=none; b=4UUSzks4cgDbd/00e9WSwtcGtjrKesmSDrPnG9o7JzJ7WzWGdjb7kf39DBOdwddyXWrU1H 0q+e1XFv6EKYhlcQtXH0jJEkbFJ+DoQkdYeti6Vacu8mUVLXflUGCBRyxQ5w6GyT+CWXVr xkBtFcBRf46r8vq1G5xK3ffH/L7v1lE= X-Rspamd-Server: rspam11 X-Rspam-User: X-Stat-Signature: zkzn3ueke46sz8c8oj6yb9u43i4w6e5c X-Rspamd-Queue-Id: DB9A98005B Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=cXmiHuo9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 3byPGYgcKCOolaWQQRQSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3byPGYgcKCOolaWQQRQSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--zokeefe.bounces.google.com X-HE-Tag: 1657152367-305688 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yang Shi The khugepaged has optimization to reduce huge page allocation calls for !CONFIG_NUMA by carrying the allocated but failed to collapse huge page to the next loop. CONFIG_NUMA doesn't do so since the next loop may try to collapse huge page from a different node, so it doesn't make too much sense to carry it. But when NUMA=n, the huge page is allocated by khugepaged_prealloc_page() before scanning the address space, so it means huge page may be allocated even though there is no suitable range for collapsing. Then the page would be just freed if khugepaged already made enough progress. This could make NUMA=n run have 5 times as much thp_collapse_alloc as NUMA=y run. This problem actually makes things worse due to the way more pointless THP allocations and makes the optimization pointless. This could be fixed by carrying the huge page across scans, but it will complicate the code further and the huge page may be carried indefinitely. But if we take one step back, the optimization itself seems not worth keeping nowadays since: * Not too many users build NUMA=n kernel nowadays even though the kernel is actually running on a non-NUMA machine. Some small devices may run NUMA=n kernel, but I don't think they actually use THP. * Since commit 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists"), THP could be cached by pcp. This actually somehow does the job done by the optimization. Signed-off-by: Yang Shi Signed-off-by: Zach O'Keefe Co-developed-by: Peter Xu Signed-off-by: Peter Xu Cc: Hugh Dickins Cc: "Kirill A. Shutemov" --- mm/khugepaged.c | 120 +++++++++++------------------------------------- 1 file changed, 26 insertions(+), 94 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5269d15e20f6..196eaadbf415 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -796,29 +796,16 @@ static int khugepaged_find_target_node(void) last_khugepaged_target_node = target_node; return target_node; } - -static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) +#else +static int khugepaged_find_target_node(void) { - if (IS_ERR(*hpage)) { - if (!*wait) - return false; - - *wait = false; - *hpage = NULL; - khugepaged_alloc_sleep(); - } else if (*hpage) { - put_page(*hpage); - *hpage = NULL; - } - - return true; + return 0; } +#endif static struct page * khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) { - VM_BUG_ON_PAGE(*hpage, *hpage); - *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); if (unlikely(!*hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); @@ -830,74 +817,6 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) count_vm_event(THP_COLLAPSE_ALLOC); return *hpage; } -#else -static int khugepaged_find_target_node(void) -{ - return 0; -} - -static inline struct page *alloc_khugepaged_hugepage(void) -{ - struct page *page; - - page = alloc_pages(alloc_hugepage_khugepaged_gfpmask(), - HPAGE_PMD_ORDER); - if (page) - prep_transhuge_page(page); - return page; -} - -static struct page *khugepaged_alloc_hugepage(bool *wait) -{ - struct page *hpage; - - do { - hpage = alloc_khugepaged_hugepage(); - if (!hpage) { - count_vm_event(THP_COLLAPSE_ALLOC_FAILED); - if (!*wait) - return NULL; - - *wait = false; - khugepaged_alloc_sleep(); - } else - count_vm_event(THP_COLLAPSE_ALLOC); - } while (unlikely(!hpage) && likely(hugepage_flags_enabled())); - - return hpage; -} - -static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) -{ - /* - * If the hpage allocated earlier was briefly exposed in page cache - * before collapse_file() failed, it is possible that racing lookups - * have not yet completed, and would then be unpleasantly surprised by - * finding the hpage reused for the same mapping at a different offset. - * Just release the previous allocation if there is any danger of that. - */ - if (*hpage && page_count(*hpage) > 1) { - put_page(*hpage); - *hpage = NULL; - } - - if (!*hpage) - *hpage = khugepaged_alloc_hugepage(wait); - - if (unlikely(!*hpage)) - return false; - - return true; -} - -static struct page * -khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) -{ - VM_BUG_ON(!*hpage); - - return *hpage; -} -#endif /* * If mmap_lock temporarily dropped, revalidate vma @@ -1148,8 +1067,10 @@ static void collapse_huge_page(struct mm_struct *mm, out_up_write: mmap_write_unlock(mm); out_nolock: - if (!IS_ERR_OR_NULL(*hpage)) + if (!IS_ERR_OR_NULL(*hpage)) { mem_cgroup_uncharge(page_folio(*hpage)); + put_page(*hpage); + } trace_mm_collapse_huge_page(mm, isolated, result); return; } @@ -1951,8 +1872,10 @@ static void collapse_file(struct mm_struct *mm, unlock_page(new_page); out: VM_BUG_ON(!list_empty(&pagelist)); - if (!IS_ERR_OR_NULL(*hpage)) + if (!IS_ERR_OR_NULL(*hpage)) { mem_cgroup_uncharge(page_folio(*hpage)); + put_page(*hpage); + } /* TODO: tracepoints */ } @@ -2197,10 +2120,7 @@ static void khugepaged_do_scan(void) lru_add_drain_all(); - while (progress < pages) { - if (!khugepaged_prealloc_page(&hpage, &wait)) - break; - + while (true) { cond_resched(); if (unlikely(kthread_should_stop() || try_to_freeze())) @@ -2216,10 +2136,22 @@ static void khugepaged_do_scan(void) else progress = pages; spin_unlock(&khugepaged_mm_lock); - } - if (!IS_ERR_OR_NULL(hpage)) - put_page(hpage); + if (progress >= pages) + break; + + if (IS_ERR(hpage)) { + /* + * If fail to allocate the first time, try to sleep for + * a while. When hit again, cancel the scan. + */ + if (!wait) + break; + wait = false; + hpage = NULL; + khugepaged_alloc_sleep(); + } + } } static bool khugepaged_should_wakeup(void)