From patchwork Tue Feb 25 00:08:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 13989136 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B57BC021B8 for ; Tue, 25 Feb 2025 00:10:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A0060280007; Mon, 24 Feb 2025 19:10:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 961ED280002; Mon, 24 Feb 2025 19:10:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B3EB280007; Mon, 24 Feb 2025 19:10:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6107E280002 for ; Mon, 24 Feb 2025 19:10:36 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 26512B102A for ; Tue, 25 Feb 2025 00:10:36 +0000 (UTC) X-FDA: 83156535672.21.0E1FB79 Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by imf12.hostedemail.com (Postfix) with ESMTP id 322454000B for ; Tue, 25 Feb 2025 00:10:34 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=ITM8eJrw; spf=pass (imf12.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.177 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740442234; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Jfbz8GKSHpu/xyC6cCph+07KnkaKzui+briDAaa0nnU=; b=ajDnfsw1Hyw+5/U7CIU8Q6DnQtZKAJoplOivdXIJz1lvUkg8WdRfcyL9XMSVGBKNlfMpeP a9O12+a7A8p4c0kh/TyDqDzyONb7NCmQhCoI/YPD4X/Q761WwM6MBh0d7Q7gVZSwMltHDE 3vMp8PtJaG+tv7mV6tLyQ9iFbTLbhNI= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=ITM8eJrw; spf=pass (imf12.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.177 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740442234; a=rsa-sha256; cv=none; b=35HMgZPMNHokvo5jGbEVfcKHHG7ANt7sskXoGKgN2HEKD/IQnTNvkfzhEgn3cZs9YHtWC2 CfiLMGstE4yu4vRzS5PZUvkshzohKrlV0sYR6qUl4trWqLIBDr0ZaFrEo+FJ0OD29U+hsJ FMyZcF+TP8ZRnaGQxI+oBU5C2gjA8zY= Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-471f2e1bea9so48062941cf.2 for ; Mon, 24 Feb 2025 16:10:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1740442233; x=1741047033; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Jfbz8GKSHpu/xyC6cCph+07KnkaKzui+briDAaa0nnU=; b=ITM8eJrwtbXb8ZuFfBiQF+9jD44liQf9mUgBvW/B3i3kY72OirW6GHDMJ/NSroaH7a q6gfXBc5mnD4T1+ArxTNrX4p/V1v2wS5NeCkt0sEoGNiditBy00GQGFlNMAri7Wq4+SQ usH7brIRy5lkSeB4Vu4N/DL05r9zIJwn/XWjdQx3PZiIhKwsjAWyRXgdI+20DJis6VBV GZNPp2yIKL6sGkCOxrdlD5Mm9JnmXN+LrSZMXm538Qo2IgveCFtJnmiy1cTFdytJSu+p MJYB9s5p79ZZ2avD+Siv/K/sSvcs/FwkCtojHAMpttXkXFm5di5P6pm3ZiYJDZFBqRVw pkoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740442233; x=1741047033; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Jfbz8GKSHpu/xyC6cCph+07KnkaKzui+briDAaa0nnU=; b=pDi3eow7ma/8yxFYeiSXt/dUkAXsH8WQKaAFsMXsKQRgoxlfblDtLFn+wu1/V3b4sH Yf7Fmmme3fnyjfgV4ABPkE4Bq3hhr2ZY8U4pgUQf5I3u0kTkjU1Ngv4v6rgjj9Ge9YNX UmnRxVdb2Z/ukso7uUzcI/vQ4eHdwTR3g/9kacSHnCOtbmWieh4WsIncOVhYPcibZfTF CpUG5G4XhFKyQl5V6XTozqOWokbjDyMFYlbK6z0sBwLoJalsYHbu1e2VEu7WVbLTw8JR zJo99r0uk3xETcv3ti6qb7Be/liUHFYDiYMpsaaCKASgNkKfkwXT2DqXlSRmn5WqGOQE 5f1Q== X-Forwarded-Encrypted: i=1; AJvYcCUtpdM68EKiTc8+n39BqvVIji5uQK+xF20PTTSyH1yoIRqTDNcs2Wont52wSTRlyIpbVIS5Qe0hOw==@kvack.org X-Gm-Message-State: AOJu0YzxjaKLa7mqASkdvhkyLRdJOXkdyJCxVCY6MvxeE+FKqyvV2Fbr r+AmjvR9+/doa3bdpx2bZu+7mO001eoIddXZjb20F3esYiVB6b0X1IM1qYFueaQ= X-Gm-Gg: ASbGncuXSPA0mzEebpvwJXLKgKwn7BRxkB8RkgeX4jfnsVz27+aTOapuW6I5dNuk3fc pDVW8NQ3AgSgK5+GE8PhEl+QZZy4uKOqUmHbDbqdgfRlunOqVj1n/lvWaQ/gLn4OOac3A8n8Oux E4Ux1s5Yk90UlGSbmV57UANVZ+ttWFEfBy1F9cSb8zwZoAHTdzBzZg1wG7V3hr1AUcZ86GTYI3a Z1QXugYZc6i8LQSWbdFWMEXm+SBnqvi2aFN3B7INe2GxwaR3ubQAY9sDoVWzmU62IUMZcb7UxBC Sc5BGG8ftuW8nYwqSL94Bx8i X-Google-Smtp-Source: AGHT+IGM3VVUjbDeCXvrKFFPK7jbq7t2UH3GiR9glB2p9o0jhpvV1lYU60G0oFU/Ke/PG5Pti/cUBg== X-Received: by 2002:a05:6214:1bcb:b0:6e6:5a8a:aba with SMTP id 6a1803df08f44-6e6ae829f34mr221625886d6.21.1740442232965; Mon, 24 Feb 2025 16:10:32 -0800 (PST) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with UTF8SMTPSA id 6a1803df08f44-6e87b1564easm3076106d6.72.2025.02.24.16.10.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Feb 2025 16:10:32 -0800 (PST) From: Johannes Weiner To: Andrew Morton Cc: Vlastimil Babka , Brendan Jackman , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/3] mm: page_alloc: don't steal single pages from biggest buddy Date: Mon, 24 Feb 2025 19:08:24 -0500 Message-ID: <20250225001023.1494422-2-hannes@cmpxchg.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250225001023.1494422-1-hannes@cmpxchg.org> References: <20250225001023.1494422-1-hannes@cmpxchg.org> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: 9ga9tx9jaeig45q73c7pg64uhhru1s8h X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 322454000B X-HE-Tag: 1740442234-456985 X-HE-Meta: U2FsdGVkX18yZGZIZ6P3UhKON4ICP4dD+uQYJnUIJZ0Sq9CueotK5MIg7OlKiy0v5GJZiV4Z+HHB83aG+ukf9blpmvvYBlD5h+HhYiy/CB6asuwaqhcmZQzXEzxo28rKr5QzZo4knADms44m+m1opar6S7MwNyeKeF4C/5s0VqJGTq07hUKCaAfLf/86mu6q7AqoDKMuwzIbEY/O8czMMwgs6YiRleDbD2tp8lvUzJo6bHEafW9108WPuxqQYrn3wa86ELaevG0Op0vJruREB9l4ziFGTuv/YUJ7T9sXJM4OsOCINA6zdhTENKlk1OX0pXaVm7dTTXvK6DynzWMloVCzk8S107tSHf+f7or7ILRN6HnjF5NPXGDU0cPIKcKwA+t1cdhafjpvblEYTdoGd4mGsIAhN9hZmLTtxvrJYuek7GNT0K/L1aZbWAbu+fmfSpD2kMbEAqyR32H/in/ynHKZdQ8vRmndcfUvvjZgsj3lXpsIrhpT0ol2gIIWLkrdVaUtbSlB4n4uoXW0R0sWYxjRfBFG1Ysi63lUcEqYwMVad7dv/wvR2HV6a21yUxKk+eOqfpGVCv/P0pcGcBYndWcdE+bNnYTO7NMFnwguXuTfow4hQFzLxONPm5uJEWD9MG3XzZmqn/v+/mTDGZjVD1zCKBWuCg/8FXHcwf0pSC8Tklbo0hecFUqnaTZ/Kf3VrGfFZWAiMQPu3bBpzYQQGNIbRkV4WFDffoa3X+txjaFeY+gcgPrhQjjwCAPiJxDOyxnU2q4M7ElbAPLDXdTDcV1czLsZEP85jiVfrd9vK3ldyhGJxcs996YRp4mytFJNtE4k3Q8qfujNBp44PUC1Qo1VDWtTQLsNP6y/PKS7Ch1QHlh8l9nmqvYatn2cpUwfZVXUNpniNcYNaMstgTZ5Au0x35TkfngDeQRllDsijz1CDs6HNlOCOBUKq8D69sRedak/mLMXc1eWHuVAG3V 2Mw9mvSn 7Av7P2MZqRoEF0ESRVMvYcY96ksToo02DKh+DwTF9jc7gXA1ViXhy0MlVLFOsM9QODUOxIJsD7oM54e0niQXh/z8TTeOnSnAy7i3otL+V/PTHB8owS7XwY+qKD8ldSLIUynhqsX2VTtjx9cOHHR8gb4IzPDROx/ivfhDbRXglRXQKprDLv+sa48kmcDcisHVRuVJMFhG/mDzR5yjI3v9zjo8YP8Fp9o6Ic4ekzhgFZhCAdpX/k1s1Zas2y3a0k6cofK6m+3XpIfPbJUsLb3WnpszctzN6tpIV1mZegUorZWCdzV0WsxKEQ/nvsxyh0XwjzmKC2B7Y3h9JfoLQ7UtEx3qiw6Z0/De8X7bx43ib/mJAcnjvII+YsNOxluLMdCbWeb9QNeKK0+bwppFqn/CTHLCK+nAoNvwxTwq7PSU1U9WSQ3r4zWjamaqdEzdUuH+QRPZ94aIlPR9JVGmiL74Powb5u0YYeU2y2V+Pw5YpqnmuP94= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The fallback code searches for the biggest buddy first in an attempt to steal the whole block and encourage type grouping down the line. The approach used to be this: - Non-movable requests will split the largest buddy and steal the remainder. This splits up contiguity, but it allows subsequent requests of this type to fall back into adjacent space. - Movable requests go and look for the smallest buddy instead. The thinking is that movable requests can be compacted, so grouping is less important than retaining contiguity. c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion") enforces freelist type hygiene, which restricts stealing to either claiming the whole block or just taking the requested chunk; no additional pages or buddy remainders can be stolen any more. The patch mishandled when to switch to finding the smallest buddy in that new reality. As a result, it may steal the exact request size, but from the biggest buddy. This causes fracturing for no good reason. Fix this by committing to the new behavior: either steal the whole block, or fall back to the smallest buddy. Remove single-page stealing from steal_suitable_fallback(). Rename it to try_to_steal_block() to make the intentions clear. If this fails, always fall back to the smallest buddy. The following is from 4 runs of mmtest's thpchallenge. "Pollute" is single page fallback, "steal" is conversion of a partially used block. The numbers for free block conversions (omitted) are comparable. vanilla patched @pollute[unmovable from reclaimable]: 27 106 @pollute[unmovable from movable]: 82 46 @pollute[reclaimable from unmovable]: 256 83 @pollute[reclaimable from movable]: 46 8 @pollute[movable from unmovable]: 4841 868 @pollute[movable from reclaimable]: 5278 12568 @steal[unmovable from reclaimable]: 11 12 @steal[unmovable from movable]: 113 49 @steal[reclaimable from unmovable]: 19 34 @steal[reclaimable from movable]: 47 21 @steal[movable from unmovable]: 250 183 @steal[movable from reclaimable]: 81 93 The allocator appears to do a better job at keeping stealing and polluting to the first fallback preference. As a result, the numbers for "from movable" - the least preferred fallback option, and most detrimental to compactability - are down across the board. Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion") Suggested-by: Vlastimil Babka Signed-off-by: Johannes Weiner --- mm/page_alloc.c | 80 +++++++++++++++++++++---------------------------- 1 file changed, 34 insertions(+), 46 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 16dfcf7ade74..9ea14ec52449 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1986,13 +1986,12 @@ static inline bool boost_watermark(struct zone *zone) * can claim the whole pageblock for the requested migratetype. If not, we check * the pageblock for constituent pages; if at least half of the pages are free * or compatible, we can still claim the whole block, so pages freed in the - * future will be put on the correct free list. Otherwise, we isolate exactly - * the order we need from the fallback block and leave its migratetype alone. + * future will be put on the correct free list. */ static struct page * -steal_suitable_fallback(struct zone *zone, struct page *page, - int current_order, int order, int start_type, - unsigned int alloc_flags, bool whole_block) +try_to_steal_block(struct zone *zone, struct page *page, + int current_order, int order, int start_type, + unsigned int alloc_flags) { int free_pages, movable_pages, alike_pages; unsigned long start_pfn; @@ -2005,7 +2004,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page, * highatomic accounting. */ if (is_migrate_highatomic(block_type)) - goto single_page; + return NULL; /* Take ownership for orders >= pageblock_order */ if (current_order >= pageblock_order) { @@ -2026,14 +2025,10 @@ steal_suitable_fallback(struct zone *zone, struct page *page, if (boost_watermark(zone) && (alloc_flags & ALLOC_KSWAPD)) set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); - /* We are not allowed to try stealing from the whole block */ - if (!whole_block) - goto single_page; - /* moving whole block can fail due to zone boundary conditions */ if (!prep_move_freepages_block(zone, page, &start_pfn, &free_pages, &movable_pages)) - goto single_page; + return NULL; /* * Determine how many pages are compatible with our allocation. @@ -2066,9 +2061,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page, return __rmqueue_smallest(zone, order, start_type); } -single_page: - page_del_and_expand(zone, page, order, current_order, block_type); - return page; + return NULL; } /* @@ -2250,14 +2243,19 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, } /* - * Try finding a free buddy page on the fallback list and put it on the free - * list of requested migratetype, possibly along with other pages from the same - * block, depending on fragmentation avoidance heuristics. Returns true if - * fallback was found so that __rmqueue_smallest() can grab it. + * Try finding a free buddy page on the fallback list. + * + * This will attempt to steal a whole pageblock for the requested type + * to ensure grouping of such requests in the future. + * + * If a whole block cannot be stolen, regress to __rmqueue_smallest() + * logic to at least break up as little contiguity as possible. * * The use of signed ints for order and current_order is a deliberate * deviation from the rest of this file, to make the for loop * condition simpler. + * + * Return the stolen page, or NULL if none can be found. */ static __always_inline struct page * __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, @@ -2291,45 +2289,35 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, if (fallback_mt == -1) continue; - /* - * We cannot steal all free pages from the pageblock and the - * requested migratetype is movable. In that case it's better to - * steal and split the smallest available page instead of the - * largest available page, because even if the next movable - * allocation falls back into a different pageblock than this - * one, it won't cause permanent fragmentation. - */ - if (!can_steal && start_migratetype == MIGRATE_MOVABLE - && current_order > order) - goto find_smallest; + if (!can_steal) + break; - goto do_steal; + page = get_page_from_free_area(area, fallback_mt); + page = try_to_steal_block(zone, page, current_order, order, + start_migratetype, alloc_flags); + if (page) + goto got_one; } - return NULL; + if (alloc_flags & ALLOC_NOFRAGMENT) + return NULL; -find_smallest: + /* No luck stealing blocks. Find the smallest fallback page */ for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) { area = &(zone->free_area[current_order]); fallback_mt = find_suitable_fallback(area, current_order, start_migratetype, false, &can_steal); - if (fallback_mt != -1) - break; - } - - /* - * This should not happen - we already found a suitable fallback - * when looking for the largest page. - */ - VM_BUG_ON(current_order > MAX_PAGE_ORDER); + if (fallback_mt == -1) + continue; -do_steal: - page = get_page_from_free_area(area, fallback_mt); + page = get_page_from_free_area(area, fallback_mt); + page_del_and_expand(zone, page, order, current_order, fallback_mt); + goto got_one; + } - /* take off list, maybe claim block, expand remainder */ - page = steal_suitable_fallback(zone, page, current_order, order, - start_migratetype, alloc_flags, can_steal); + return NULL; +got_one: trace_mm_page_alloc_extfrag(page, order, current_order, start_migratetype, fallback_mt);