From patchwork Fri Mar 8 07:49:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Lance Yang X-Patchwork-Id: 13586521 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 788F8C54E4A for ; Fri, 8 Mar 2024 07:49:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 649146B033D; Fri, 8 Mar 2024 02:49:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5FB6D6B033E; Fri, 8 Mar 2024 02:49:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 49B086B033F; Fri, 8 Mar 2024 02:49:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3B5AE6B033D for ; Fri, 8 Mar 2024 02:49:43 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 0DD321C1781 for ; Fri, 8 Mar 2024 07:49:43 +0000 (UTC) X-FDA: 81873097446.13.67D7169 Received: from mail-oi1-f171.google.com (mail-oi1-f171.google.com [209.85.167.171]) by imf08.hostedemail.com (Postfix) with ESMTP id 71D8F16000E for ; Fri, 8 Mar 2024 07:49:41 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lINmoChz; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.167.171 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709884181; a=rsa-sha256; cv=none; b=eMs5b4n6SpVTQoHere34RCWqfRaYoJCyDuTKyYK3Z1bhHXX1+JxeBAKMyop+XS/OChEKKW MUQV41EI7zdU9jM7n5JyMi+mvPihJJFEw1qGKwRlKm9eQCOUUoC37hELqYz8fR6ivBo9OE mj/9rvepPozyp4YtpWrIbVYPJimOGkc= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lINmoChz; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.167.171 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709884181; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=6qMSiEu8nY0E83yNC6g1NCX6P1kFAKXul/MRDUIem5A=; b=yT1hrqAUgHkzhuS2ElfTb4iDrljqBsb7ZP9SRekSI6vY3ro7mOwmfkMKellmUxy/XF1Z6R ZN7F/scS1XPuPi0T+S10WHCeezR7eHB+dWuWkD1JYAtmRqKZtDRBzeTxVlsuf40L2EaOb/ BuiilHUimkrAPr1d58dnPHmehcCX6qw= Received: by mail-oi1-f171.google.com with SMTP id 5614622812f47-3bbbc6b4ed1so1070546b6e.2 for ; Thu, 07 Mar 2024 23:49:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709884180; x=1710488980; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=6qMSiEu8nY0E83yNC6g1NCX6P1kFAKXul/MRDUIem5A=; b=lINmoChzcuIig+MPZ9tryUiQ7B2i5aQT+SaUcO7mqTrijXBWI7YCYdN+dOG7wpp4h1 j1Zvp600olV3opj0jR3SNQP19bcgLJd17yCw/1qPSslTfsWgapKZI09cwAbp3BVvhhxZ IJy3jNjN3UK2JLKEG2no9CNpVYtgozWavIHHBTbNPIYOxPvZA0835yzFWuOMcsmDjr8c vBPeJKfiNSNGOgZLd5D6bhq3mFzXUKFVWglcEMizQphb/3ZD0PT1vGouKn1RceitW7cb 1dSMetKW8IPR1y+iRQBhP3cZMLD2qIXCxcSWqXES6L/qD5hEhD3F1rDLcQkzgM91gdC2 091A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709884180; x=1710488980; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6qMSiEu8nY0E83yNC6g1NCX6P1kFAKXul/MRDUIem5A=; b=HN+UnBPbe3geOHIVh1rUrSfX85G5/FLg/6JGA5FK1QPxgxN1FCtZvEQ7nUbe0St+H0 FwDzw5WNQmeTek0labzVAmnwlx03rHh5f21YSClU7G4olalrJJjEdfH5ndvulSCTZyNP TkLvmKySGi+xbOYTYOwLHpPiXGcAYR9CE9MarY4TNlygdjpUZpT6OQIxgY2V6dpJ8zBJ oUH8D4rmgongkBeBzFFzOEsiudYShXytMZdFwVyR2d63lWFJeZfYZ/z1yGNQjSVQGFuM 7HA1eCrporTkCCYFd+GH5OnMPDeray2QpAJPRPNjifJrAuDSOTLeO449vjbgCGSGf7EK 13CQ== X-Forwarded-Encrypted: i=1; AJvYcCXabIkjmOSskj4KJy1rUx8H4renjh0lawov06XaRJ+RWEO/u1/Es1cX4D1rkmWHsNSxuOuemzj7WhmkAprZ8JkI0Sg= X-Gm-Message-State: AOJu0YzjcRDWNDOJaJWNX0Kvssway4n7wzo1ZaEt7qevCxaMLs/NcpZB fY4PgSjl/zblUqhw8RFtoe0A69s/wywmHTn3PhInw3DDf7Tv5HRv X-Google-Smtp-Source: AGHT+IGSIbgY91MiTNX/RJduGlZDvnVktYm1Sx9nN8RxL+FOCKS2JaShyJqCkaDLvmQx9fzUlAI+GA== X-Received: by 2002:a05:6808:1a0d:b0:3c1:cd88:2897 with SMTP id bk13-20020a0568081a0d00b003c1cd882897mr13470605oib.16.1709884179745; Thu, 07 Mar 2024 23:49:39 -0800 (PST) Received: from LancedeMBP.lan ([112.10.240.9]) by smtp.gmail.com with ESMTPSA id z19-20020aa785d3000000b006e583a649b4sm13597973pfn.210.2024.03.07.23.49.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Mar 2024 23:49:39 -0800 (PST) From: Lance Yang To: akpm@linux-foundation.org Cc: david@redhat.com, mhocko@suse.com, zokeefe@google.com, shy828301@gmail.com, xiehuan09@gmail.com, songmuchun@bytedance.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lance Yang Subject: [PATCH 1/1] mm/khugepaged: reduce process visible downtime by pre-zeroing hugepage Date: Fri, 8 Mar 2024 15:49:21 +0800 Message-Id: <20240308074921.45752-1-ioworker0@gmail.com> X-Mailer: git-send-email 2.33.1 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 71D8F16000E X-Stat-Signature: uquqas6uptjoqup8irumapcxgk97yh7r X-HE-Tag: 1709884181-925121 X-HE-Meta: U2FsdGVkX19mGTVeE4bYOjW6+xQ2eP8vaCBQmNxvvhi+sfJSym0+Kg0CXFz9oI9xZc+r7v8GqFRbaWUMlFV/5iTuArJaSr+XKSL+m4UNWTPsWsgN3mxElCzuAcOBDvmespTqpMHnigenDBKh9IKRkloD41hzZTzV3U2crjt0LpYW5oFRPbFc2GUq1uXI3SVJYCnZMUe1g8EgkbaEks1Orx2IeUsvEk1jmOb971vsvmSBQTJirqNYujHWxVnxPLsj+sdS95jjmESe95GQ9svvCmRjfAd4mrmTvT3Uurcu37NkzbpWf5q7we9EDuJu3sY1MRgGw3MREV8qOQ25Ub1x1LVhsw2nHQdrGNlhLWQtU7MFfwH2wyRyMOjmEpv53PSfXGE1di4gw7VpgNaDdMjoa9azLr4KUBfHWgn/XZYCSauXYbyEoiI7pKtG1lLYtCz8kdHMIGqHrJYsQ8WCEJsNH0ysRFBWt8PUTyfI/SDikESEbSlxzZl6EnKAH8IRl24nJl4Cug6tO+oOxhZhR8RhKc3677jFeSgY/b+C88YpC037nyr3gkEXd5m1aJo6SoKkmS+/RVTexXaSeuRWeH4N+F9eCeT+WYB0osy6qufKlypy0duN39C2o0vVGFBuDHUR8wtvKLehCnAE6hPhrKtu5/C6G1dVQ23zfHYVdJTqgEFqSn46b7SxSQXbWPdgDh0tWt5Bq3WyVYSUCdXVD/CiVFExnkFzzV/Achy38HfSjHqgueHMaZ5x7263f0qafJ31Y57XRyBOGCWCDGtkJbrvXusySjL3FWWV5Pj8Vo6/BsZHYMAcalR4qO2KAqyawZXQmLU8rjdAjnYXRXpiP9TBmcuBxZJRXiS/e9AwwrE0vJOJTAjwStXtXTAsBNL1v6VikSbh0svH8lY5lOP3u3UBWYn+V7j6+Ffzyd4gXh7dM6ktFzmxYgn9aEXpe6R80gSpBra8Afjmawp8SD+RNCC AnApgevw p4mo5dla55Yhef/I3S+kaR3Ncx3Apt521aDlAAyz7cbd4COepF1ik7uukga62hJxq5FGlMJr8JeZvz3U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The patch reduces the process visible downtime during hugepage collapse. This is achieved by pre-zeroing the hugepage before acquiring mmap_lock(write mode) if nr_pte_none >= 256, without affecting the efficiency of khugepaged. On an Intel Core i5 CPU, the process visible downtime during hugepage collapse is as follows: | nr_ptes_none | w/o __GFP_ZERO | w/ __GFP_ZERO | Change | --------------------------------------------------—---------- | 511 | 233us | 95us | -59.21%| | 384 | 376us | 219us | -41.20%| | 256 | 421us | 323us | -23.28%| | 128 | 523us | 507us | -3.06%| Of course, alloc_charge_hpage() will take longer to run with the __GFP_ZERO flag. | Func | w/o __GFP_ZERO | w/ __GFP_ZERO | |----------------------|----------------|---------------| | alloc_charge_hpage | 198us | 295us | But it's not a big deal because it doesn't impact the total time spent by khugepaged in collapsing a hugepage. In fact, it would decrease. Signed-off-by: Lance Yang --- mm/khugepaged.c | 33 +++++++++++++++++++++++++++------ 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 38830174608f..a2872596b865 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -88,6 +88,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait); static unsigned int khugepaged_max_ptes_none __read_mostly; static unsigned int khugepaged_max_ptes_swap __read_mostly; static unsigned int khugepaged_max_ptes_shared __read_mostly; +static unsigned int khugepaged_min_ptes_none_prezero __read_mostly; #define MM_SLOTS_HASH_BITS 10 static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS); @@ -96,6 +97,7 @@ static struct kmem_cache *mm_slot_cache __ro_after_init; struct collapse_control { bool is_khugepaged; + bool alloc_zeroed_hpage; /* Num pages scanned per node */ u32 node_load[MAX_NUMNODES]; @@ -396,6 +398,7 @@ int __init khugepaged_init(void) khugepaged_max_ptes_none = HPAGE_PMD_NR - 1; khugepaged_max_ptes_swap = HPAGE_PMD_NR / 8; khugepaged_max_ptes_shared = HPAGE_PMD_NR / 2; + khugepaged_min_ptes_none_prezero = HPAGE_PMD_NR / 2; return 0; } @@ -782,6 +785,7 @@ static int __collapse_huge_page_copy(pte_t *pte, struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl, + struct collapse_control *cc, struct list_head *compound_pagelist) { struct page *src_page; @@ -797,7 +801,8 @@ static int __collapse_huge_page_copy(pte_t *pte, _pte++, page++, _address += PAGE_SIZE) { pteval = ptep_get(_pte); if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { - clear_user_highpage(page, _address); + if (!cc->alloc_zeroed_hpage) + clear_user_highpage(page, _address); continue; } src_page = pte_page(pteval); @@ -1067,6 +1072,9 @@ static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, int node = hpage_collapse_find_target_node(cc); struct folio *folio; + if (cc->alloc_zeroed_hpage) + gfp |= __GFP_ZERO; + if (!hpage_collapse_alloc_folio(&folio, gfp, node, &cc->alloc_nmask)) { *hpage = NULL; return SCAN_ALLOC_HUGE_PAGE_FAIL; @@ -1209,7 +1217,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, anon_vma_unlock_write(vma->anon_vma); result = __collapse_huge_page_copy(pte, hpage, pmd, _pmd, - vma, address, pte_ptl, + vma, address, pte_ptl, cc, &compound_pagelist); pte_unmap(pte); if (unlikely(result != SCAN_SUCCEED)) @@ -1272,6 +1280,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); + cc->alloc_zeroed_hpage = false; pte = pte_offset_map_lock(mm, pmd, address, &ptl); if (!pte) { result = SCAN_PMD_NULL; @@ -1408,6 +1417,10 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (result == SCAN_SUCCEED) { + if (cc->is_khugepaged && + none_or_zero >= khugepaged_min_ptes_none_prezero) + cc->alloc_zeroed_hpage = true; + result = collapse_huge_page(mm, address, referenced, unmapped, cc); /* collapse_huge_page will return with the mmap_lock released */ @@ -2054,7 +2067,8 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, index = start; list_for_each_entry(page, &pagelist, lru) { while (index < page->index) { - clear_highpage(hpage + (index % HPAGE_PMD_NR)); + if (!cc->alloc_zeroed_hpage) + clear_highpage(hpage + (index % HPAGE_PMD_NR)); index++; } if (copy_mc_highpage(hpage + (page->index % HPAGE_PMD_NR), page) > 0) { @@ -2064,7 +2078,8 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, index++; } while (index < end) { - clear_highpage(hpage + (index % HPAGE_PMD_NR)); + if (!cc->alloc_zeroed_hpage) + clear_highpage(hpage + (index % HPAGE_PMD_NR)); index++; } @@ -2234,6 +2249,7 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, swap = 0; memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); + cc->alloc_zeroed_hpage = false; rcu_read_lock(); xas_for_each(&xas, page, start + HPAGE_PMD_NR - 1) { if (xas_retry(&xas, page)) @@ -2305,11 +2321,16 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, rcu_read_unlock(); if (result == SCAN_SUCCEED) { - if (cc->is_khugepaged && - present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { + if (!cc->is_khugepaged) + result = collapse_file(mm, addr, file, start, cc); + else if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { + if (HPAGE_PMD_NR - present >= + khugepaged_min_ptes_none_prezero) + cc->alloc_zeroed_hpage = true; + result = collapse_file(mm, addr, file, start, cc); } }