From patchwork Fri Apr 14 08:22:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 13211102 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F8AEC77B6E for ; Fri, 14 Apr 2023 08:22:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3A2EA900002; Fri, 14 Apr 2023 04:22:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 353766B0075; Fri, 14 Apr 2023 04:22:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 241F5900002; Fri, 14 Apr 2023 04:22:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 11C706B0072 for ; Fri, 14 Apr 2023 04:22:30 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C8C3E1201C5 for ; Fri, 14 Apr 2023 08:22:29 +0000 (UTC) X-FDA: 80679304818.30.2F56C85 Received: from outbound-smtp25.blacknight.com (outbound-smtp25.blacknight.com [81.17.249.193]) by imf20.hostedemail.com (Postfix) with ESMTP id F21D51C000D for ; Fri, 14 Apr 2023 08:22:26 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=none; spf=pass (imf20.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.193 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681460547; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references; bh=MJoDcD9CqIyNsMdHLiKgxsPgtt6esy4hjjK8Ozy78Kk=; b=F2yxLqQqPd/dJkvpX7VWQVsieDgdC0CniEnSa2XtpZdYHV9vICDwmVFB6/WscznGDPrYXQ yN5+2KGtGjqUpoJ5bOFuw/eHYajjzoaKlTXWdHG+AQkP8yI47vXgoBtaj1AiTu6tN+LC7z KjWOBspfrQwbYoAYVy7GAGPoWDIP31k= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none; spf=pass (imf20.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.193 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681460547; a=rsa-sha256; cv=none; b=Hi/QqTQwNxKOraBRotVnLgsDtkd5V/RinnkFxRWtSGlfNCMz4mFxn+c6yPNoXGLwqxVhvC K1jAf5rk8BGyLeyHZEnCBAffw2Ta0dG9punFQgt/J2RBfZte/ctf5+cR1YDqf/De/xCDnQ 4cKdYS8iugoFSZ6idlfmfVfOBIRkRvY= Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp25.blacknight.com (Postfix) with ESMTPS id 00131CAE22 for ; Fri, 14 Apr 2023 09:22:24 +0100 (IST) Received: (qmail 32439 invoked from network); 14 Apr 2023 08:22:24 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.21.103]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 14 Apr 2023 08:22:24 -0000 Date: Fri, 14 Apr 2023 09:22:22 +0100 From: Mel Gorman To: Andrew Morton Cc: Vlastimil Babka , Michal Hocko , Oscar Salvador , Yuanxi Liu , David Hildenbrand , Linux-MM , LKML , Mel Gorman Subject: [PATCH] mm: page_alloc: Assume huge tail pages are valid when allocating contiguous pages Message-ID: <20230414082222.idgw745cgcduzy37@techsingularity.net> MIME-Version: 1.0 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: qbmmny8y8rirmnb4czptapcgksd39hqj X-Rspamd-Queue-Id: F21D51C000D X-HE-Tag: 1681460546-777286 X-HE-Meta: U2FsdGVkX18WayR7qGeAfizNsLVBIAyTNanMX+8KQPA3IgvZ3rNxudHYJaaPjLUtq0wOq1eXbBQQ5tScZmzMMYVwwLiRIjxwJjI5Xs3wXHZ3belKmlnc9+mhIQVmjRmecrOh+2XS2QOcy8I/1zbP7hUO9FMqhGjEEVKdfk6NVdwtVeMDLwMZiWk+mxUTH0+Aru/jtXGl+PtzYKJkVX1SV1p6etsJa6B40goKje7hr95s1vtOgXk/Q4aF+nSOG8a4OwMngbSydglBlddOqHmL7kaCPY/GowOnNArduVqWroH7f0JPjjqhX7k68fijquLsOOkrp3SS7JVZRzepkvShjxCSJ6dO6WoAeE+BoszuOEBrsO+xRiPQ+vPldRsMB8r+WekY97SwilZ7deymEXZpPoCk6QpE7dcm1tjZ08OVT6MlmMcTVVLIlaagNbZEkDNS1ektHmqcrpHgHf5Wg8xUPPsIFzKgdfrcOKG0ooBQH/tSqT18nvbgeZkLziGReyditinHRO+y3pynZrn66/vAGc3LaupiEoA9QKaLLbzx5VYPEmXCw4QqB0YFsjMt1vaZ16mq+rNQo+KqIAi4GS8vfarsGHXIKbf0WMpaAnyhqpyRemQLkwQqBmzVrURzKo0Sx/PrNrZ94UsmwiKKiS7CmCfu75szpSjxsRS4+FtI4wUResCz8EsDLblwgefQnI/smTH/n3iIaWtQmvOfMAjAsQ4Txx/v7rC/biaDSUSU/y4PVUh4hQJoAgjvHW3i9zxNH2oqSy9nwoCRxU/RcBrBGmQei7rrX1vaweTBKoIMOGaT9ow/0auZSuvK5ccVzx9/lHUD32lyd5AmldAqQp+mYbO3b3VAg76KXq4coHp7uyGRwsEbhmtDd34b7+n28jiJMLq8SczLaAzXrVTn5PiGB5MLFEpsMpnPtnjamzw4UHlT7nrK7/S7O/lnP2u148j4/jgfrB41h44n3zRtbaT 28erpB6Y 5jK7hmo4fMYKOigDAYzGuWRqNaXDNajihp/pU7iH0aFfjuB2X6uu5e12BOnj5qoev46L+D5ZOA5AjyQBimyuOKeKZDKrLWcD3bPqSy/WqFHJN+qpzEETJpuT7dVYccUO8upWUfxWmN7/QLGFZhVFIMvyOEKLYLe0M4rCHspQmoR9ybimteLw0zYEhqxEE39wKu3brH1cfivdGflMx8HxYStjr/ooFUC4Apk//l991pRvrZKbMjmb6RMUuu3r9mahXbNv/FLSzQF6ErXSDqRlkUuPeUckjWXQfM6QNfkkr5dkHF+ITuRsp4cnoevIyJpyhZrIeusDjygjm6v9qYagkYpO+J9o3bOfgwwDPd18Z+V0g0QYgYYK8PqXYvGDUm6h7iF6NgkdRJsvixi/1P5TdF2BWaZKNJn6w2Epw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A bug was reported by Yuanxi Liu where allocating 1G pages at runtime is taking an excessive amount of time for large amounts of memory. Further testing allocating huge pages that the cost is linear i.e. if allocating 1G pages in batches of 10 then the time to allocate nr_hugepages from 10->20->30->etc increases linearly even though 10 pages are allocated at each step. Profiles indicated that much of the time is spent checking the validity within already existing huge pages and then attempting a migration that fails after isolating the range, draining pages and a whole lot of other useless work. Commit eb14d4eefdc4 ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig") removed two checks, one which ignored huge pages for contiguous allocations as huge pages can migrate. While there may be value on migrating a 2M page to satisfy a 1G allocation, it's pointless to move a 1G page for a new 1G allocation or scan for validity within an existing huge page. Reintroduce the PageHuge check with some limitations. The new check will allow an attempt to migrate a 2M page for a 1G allocation but contiguous requests within CMA regions will always attempt the migration. The function is renamed as pfn_range_valid_contig can be easily confused with a pfn_valid() check which is not what the function does. The hpagealloc test allocates huge pages in batches and reports the average latency per page over time. This test happens just after boot when fragmentation is not an issue. Units are in milliseconds. hpagealloc 6.3.0-rc6 6.3.0-rc6 6.3.0-rc6 vanilla hugeallocrevert-v1r1 hugeallocfix-v1r1 Min Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 20.30 ( 23.19%) 1st-qrtle Latency 356.61 ( 0.00%) 5.34 ( 98.50%) 20.57 ( 94.23%) 2nd-qrtle Latency 697.26 ( 0.00%) 5.47 ( 99.22%) 20.84 ( 97.01%) 3rd-qrtle Latency 972.94 ( 0.00%) 5.50 ( 99.43%) 21.16 ( 97.83%) Max-1 Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 20.30 ( 23.19%) Max-5 Latency 82.14 ( 0.00%) 5.11 ( 93.78%) 20.49 ( 75.05%) Max-10 Latency 150.54 ( 0.00%) 5.20 ( 96.55%) 20.52 ( 86.37%) Max-90 Latency 1164.45 ( 0.00%) 5.53 ( 99.52%) 21.20 ( 98.18%) Max-95 Latency 1223.06 ( 0.00%) 5.55 ( 99.55%) 21.22 ( 98.26%) Max-99 Latency 1278.67 ( 0.00%) 5.57 ( 99.56%) 22.81 ( 98.22%) Max Latency 1310.90 ( 0.00%) 8.06 ( 99.39%) 24.87 ( 98.10%) Amean Latency 678.36 ( 0.00%) 5.44 * 99.20%* 20.93 * 96.91%* 6.3.0-rc6 6.3.0-rc6 6.3.0-rc6 vanilla revert-v1 hugeallocfix-v1 Duration User 0.28 0.27 0.27 Duration System 808.66 17.77 36.63 Duration Elapsed 830.87 18.08 36.95 The vanilla kernel is poor, taking up to 1.3 second to allocate a huge page and almost 10 minutes in total to run the test. Reverting the problematic commit reduces it to 8ms at worst and the patch takes 24ms. This patch fixes the main issue with skipping huge pages but leaves the page_count() alone because a page with an elevated count potentially can migrate. Note that a simplier fix that simply checks PageHuge also performs similarly with the caveat that 1G allocations may fail due to to smaller huge pages that could have been migrated. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=217022 Fixes: eb14d4eefdc4 ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig") Reported-by: Yuanxi Liu Signed-off-by: Mel Gorman --- mm/page_alloc.c | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7136c36c5d01..9036306b3d53 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -9434,22 +9434,45 @@ static int __alloc_contig_pages(unsigned long start_pfn, gfp_mask); } -static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn, +/* + * Returns true if it's worth trying to migrate pages within a range for + * a contiguous allocation. + */ +static bool pfn_range_suitable_contig(struct zone *z, unsigned long start_pfn, unsigned long nr_pages) { unsigned long i, end_pfn = start_pfn + nr_pages; struct page *page; for (i = start_pfn; i < end_pfn; i++) { + /* Must be valid. */ page = pfn_to_online_page(i); if (!page) return false; + /* Must be within one zone. */ if (page_zone(page) != z) return false; + /* Reserved pages cannot migrate. */ if (PageReserved(page)) return false; + + /* + * Do not migrate huge pages that span the size of the region + * being allocated contiguous. e.g. Do not migrate a 1G page + * for a 1G allocation request. CMA is an exception as the + * region may be reserved for hardware that requires physical + * memory without a MMU or scatter/gather capability. + * + * Note that the compound check is race-prone versus + * free/split/collapse but it should be safe and result in + * a premature skip or a useless migration attempt. + */ + if (PageHuge(page) && compound_nr(page) >= nr_pages && + !is_migrate_cma_page(page)) { + return false; + } } return true; } @@ -9498,7 +9521,7 @@ struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask, pfn = ALIGN(zone->zone_start_pfn, nr_pages); while (zone_spans_last_pfn(zone, pfn, nr_pages)) { - if (pfn_range_valid_contig(zone, pfn, nr_pages)) { + if (pfn_range_suitable_contig(zone, pfn, nr_pages)) { /* * We release the zone lock here because * alloc_contig_range() will also lock the zone