From patchwork Wed Mar 17 11:12:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 12145551 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1943C433DB for ; Wed, 17 Mar 2021 11:13:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 621C664F69 for ; Wed, 17 Mar 2021 11:13:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 621C664F69 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E2ABD6B0070; Wed, 17 Mar 2021 07:13:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E014A6B006E; Wed, 17 Mar 2021 07:13:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC7FE6B0073; Wed, 17 Mar 2021 07:13:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0105.hostedemail.com [216.40.44.105]) by kanga.kvack.org (Postfix) with ESMTP id B0BC66B006E for ; Wed, 17 Mar 2021 07:13:05 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5D9AD8249980 for ; Wed, 17 Mar 2021 11:13:05 +0000 (UTC) X-FDA: 77929104330.09.1AF6E82 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf17.hostedemail.com (Postfix) with ESMTP id D49D540001FE for ; Wed, 17 Mar 2021 11:13:04 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id C71F5AC23; Wed, 17 Mar 2021 11:13:03 +0000 (UTC) From: Oscar Salvador To: Andrew Morton Cc: Vlastimil Babka , David Hildenbrand , Michal Hocko , Muchun Song , Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v5 1/5] mm,page_alloc: Bail out earlier on -ENOMEM in alloc_contig_migrate_range Date: Wed, 17 Mar 2021 12:12:47 +0100 Message-Id: <20210317111251.17808-2-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20210317111251.17808-1-osalvador@suse.de> References: <20210317111251.17808-1-osalvador@suse.de> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D49D540001FE X-Stat-Signature: kqxw796qmrjaiopw3xinafpchrb4iaxk Received-SPF: none (suse.de>: No applicable sender policy available) receiver=imf17; identity=mailfrom; envelope-from=""; helo=mx2.suse.de; client-ip=195.135.220.15 X-HE-DKIM-Result: none/none X-HE-Tag: 1615979584-732227 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, __alloc_contig_migrate_range can generate -EINTR, -ENOMEM or -EBUSY, and report them down the chain. The problem is that when migrate_pages() reports -ENOMEM, we keep going till we exhaust all the try-attempts (5 at the moment) instead of bailing out. migrate_pages() bails out right away on -ENOMEM because it is considered a fatal error. Do the same here instead of keep going and retrying. Signed-off-by: Oscar Salvador Acked-by: Vlastimil Babka Reviewed-by: David Hildenbrand Acked-by: Michal Hocko --- mm/page_alloc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cfc72873961d..a4f67063b85f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8481,7 +8481,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, } tries = 0; } else if (++tries == 5) { - ret = ret < 0 ? ret : -EBUSY; + ret = -EBUSY; break; } @@ -8491,6 +8491,12 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, ret = migrate_pages(&cc->migratepages, alloc_migration_target, NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE); + /* + * On -ENOMEM, migrate_pages() bails out right away. It is pointless + * to retry again over this error, so do the same here. + */ + if (ret == -ENOMEM) + break; } if (ret < 0) { putback_movable_pages(&cc->migratepages); From patchwork Wed Mar 17 11:12:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 12145555 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1318CC433E0 for ; Wed, 17 Mar 2021 11:13:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8F53C64F69 for ; Wed, 17 Mar 2021 11:13:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8F53C64F69 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8FB0E6B0071; Wed, 17 Mar 2021 07:13:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8D1326B0073; Wed, 17 Mar 2021 07:13:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 773568D0001; Wed, 17 Mar 2021 07:13:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0105.hostedemail.com [216.40.44.105]) by kanga.kvack.org (Postfix) with ESMTP id 5DDF36B0071 for ; Wed, 17 Mar 2021 07:13:06 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 28F016D72 for ; Wed, 17 Mar 2021 11:13:06 +0000 (UTC) X-FDA: 77929104372.09.ADF873A Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf02.hostedemail.com (Postfix) with ESMTP id 905EE407F8E8 for ; Wed, 17 Mar 2021 11:13:05 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 8143EAD72; Wed, 17 Mar 2021 11:13:04 +0000 (UTC) From: Oscar Salvador To: Andrew Morton Cc: Vlastimil Babka , David Hildenbrand , Michal Hocko , Muchun Song , Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v5 2/5] mm,compaction: Let isolate_migratepages_{range,block} return error codes Date: Wed, 17 Mar 2021 12:12:48 +0100 Message-Id: <20210317111251.17808-3-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20210317111251.17808-1-osalvador@suse.de> References: <20210317111251.17808-1-osalvador@suse.de> X-Stat-Signature: pwqzf11ykfzh55bzdhbkgbcyih87kuie X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 905EE407F8E8 Received-SPF: none (suse.de>: No applicable sender policy available) receiver=imf02; identity=mailfrom; envelope-from=""; helo=mx2.suse.de; client-ip=195.135.220.15 X-HE-DKIM-Result: none/none X-HE-Tag: 1615979585-692035 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, isolate_migratepages_{range,block} and their callers use a pfn == 0 vs pfn != 0 scheme to let the caller know whether there was any error during isolation. This does not work as soon as we need to start reporting different error codes and make sure we pass them down the chain, so they are properly interpreted by functions like e.g: alloc_contig_range. Let us rework isolate_migratepages_{range,block} so we can report error codes. Since isolate_migratepages_block will stop returning the next pfn to be scanned, we reuse the cc->migrate_pfn field to keep track of that. Signed-off-by: Oscar Salvador Acked-by: Vlastimil Babka --- mm/compaction.c | 48 ++++++++++++++++++++++++------------------------ mm/internal.h | 2 +- mm/page_alloc.c | 7 +++---- 3 files changed, 28 insertions(+), 29 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index e04f4476e68e..5769753a8f60 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -787,15 +787,16 @@ static bool too_many_isolated(pg_data_t *pgdat) * * Isolate all pages that can be migrated from the range specified by * [low_pfn, end_pfn). The range is expected to be within same pageblock. - * Returns zero if there is a fatal signal pending, otherwise PFN of the - * first page that was not scanned (which may be both less, equal to or more - * than end_pfn). + * Returns -EINTR in case we need to abort when we have too many isolated pages + * due to e.g: signal pending, async mode or having still pages to migrate, or 0. + * cc->migrate_pfn will contain the next pfn to scan (which may be both less, + * equal to or more that end_pfn). * * The pages are isolated on cc->migratepages list (not required to be empty), * and cc->nr_migratepages is updated accordingly. The cc->migrate_pfn field * is neither read nor updated. */ -static unsigned long +static int isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, unsigned long end_pfn, isolate_mode_t isolate_mode) { @@ -810,6 +811,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, unsigned long next_skip_pfn = 0; bool skip_updated = false; + cc->migrate_pfn = low_pfn; + /* * Ensure that there are not too many pages isolated from the LRU * list by either parallel reclaimers or compaction. If there are, @@ -818,16 +821,16 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, while (unlikely(too_many_isolated(pgdat))) { /* stop isolation if there are still pages not migrated */ if (cc->nr_migratepages) - return 0; + return -EINTR; /* async migration should just abort */ if (cc->mode == MIGRATE_ASYNC) - return 0; + return -EINTR; congestion_wait(BLK_RW_ASYNC, HZ/10); if (fatal_signal_pending(current)) - return 0; + return -EINTR; } cond_resched(); @@ -1130,7 +1133,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (nr_isolated) count_compact_events(COMPACTISOLATED, nr_isolated); - return low_pfn; + cc->migrate_pfn = low_pfn; + + return 0; } /** @@ -1139,15 +1144,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, * @start_pfn: The first PFN to start isolating. * @end_pfn: The one-past-last PFN. * - * Returns zero if isolation fails fatally due to e.g. pending signal. - * Otherwise, function returns one-past-the-last PFN of isolated page - * (which may be greater than end_pfn if end fell in a middle of a THP page). + * Returns -EINTR in case isolation fails fatally due to e.g. pending signal, + * or 0. */ -unsigned long +int isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn, unsigned long end_pfn) { unsigned long pfn, block_start_pfn, block_end_pfn; + int ret = 0; /* Scan block by block. First and last block may be incomplete */ pfn = start_pfn; @@ -1166,17 +1171,17 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn, block_end_pfn, cc->zone)) continue; - pfn = isolate_migratepages_block(cc, pfn, block_end_pfn, - ISOLATE_UNEVICTABLE); + ret = isolate_migratepages_block(cc, pfn, block_end_pfn, + ISOLATE_UNEVICTABLE); - if (!pfn) + if (ret) break; if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX) break; } - return pfn; + return ret; } #endif /* CONFIG_COMPACTION || CONFIG_CMA */ @@ -1847,7 +1852,7 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc) */ for (; block_end_pfn <= cc->free_pfn; fast_find_block = false, - low_pfn = block_end_pfn, + cc->migrate_pfn = low_pfn = block_end_pfn, block_start_pfn = block_end_pfn, block_end_pfn += pageblock_nr_pages) { @@ -1889,10 +1894,8 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc) } /* Perform the isolation */ - low_pfn = isolate_migratepages_block(cc, low_pfn, - block_end_pfn, isolate_mode); - - if (!low_pfn) + if (isolate_migratepages_block(cc, low_pfn, block_end_pfn, + isolate_mode)) return ISOLATE_ABORT; /* @@ -1903,9 +1906,6 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc) break; } - /* Record where migration scanner will be restarted. */ - cc->migrate_pfn = low_pfn; - return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE; } diff --git a/mm/internal.h b/mm/internal.h index 1432feec62df..1f2ccba8e289 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -261,7 +261,7 @@ struct capture_control { unsigned long isolate_freepages_range(struct compact_control *cc, unsigned long start_pfn, unsigned long end_pfn); -unsigned long +int isolate_migratepages_range(struct compact_control *cc, unsigned long low_pfn, unsigned long end_pfn); int find_suitable_fallback(struct free_area *area, unsigned int order, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a4f67063b85f..4cb455355f6d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8474,11 +8474,10 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, if (list_empty(&cc->migratepages)) { cc->nr_migratepages = 0; - pfn = isolate_migratepages_range(cc, pfn, end); - if (!pfn) { - ret = -EINTR; + ret = isolate_migratepages_range(cc, pfn, end); + if (ret) break; - } + pfn = cc->migrate_pfn; tries = 0; } else if (++tries == 5) { ret = -EBUSY; From patchwork Wed Mar 17 11:12:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 12145557 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E84DFC433E6 for ; Wed, 17 Mar 2021 11:13:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5F59464F69 for ; Wed, 17 Mar 2021 11:13:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5F59464F69 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 862198D0001; Wed, 17 Mar 2021 07:13:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 772E86B0074; Wed, 17 Mar 2021 07:13:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 550C58D0001; Wed, 17 Mar 2021 07:13:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0153.hostedemail.com [216.40.44.153]) by kanga.kvack.org (Postfix) with ESMTP id 3058F6B0073 for ; Wed, 17 Mar 2021 07:13:07 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id DD88452D0 for ; Wed, 17 Mar 2021 11:13:06 +0000 (UTC) X-FDA: 77929104372.12.4165E0C Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf21.hostedemail.com (Postfix) with ESMTP id 43529E0001AF for ; Wed, 17 Mar 2021 11:13:06 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3D787AD74; Wed, 17 Mar 2021 11:13:05 +0000 (UTC) From: Oscar Salvador To: Andrew Morton Cc: Vlastimil Babka , David Hildenbrand , Michal Hocko , Muchun Song , Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v5 3/5] mm: Make alloc_contig_range handle free hugetlb pages Date: Wed, 17 Mar 2021 12:12:49 +0100 Message-Id: <20210317111251.17808-4-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20210317111251.17808-1-osalvador@suse.de> References: <20210317111251.17808-1-osalvador@suse.de> X-Stat-Signature: hdc6a9sezi88brt3cpbbrffibf5cm44q X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 43529E0001AF Received-SPF: none (suse.de>: No applicable sender policy available) receiver=imf21; identity=mailfrom; envelope-from=""; helo=mx2.suse.de; client-ip=195.135.220.15 X-HE-DKIM-Result: none/none X-HE-Tag: 1615979586-133274 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: alloc_contig_range will fail if it ever sees a HugeTLB page within the range we are trying to allocate, even when that page is free and can be easily reallocated. This has proved to be problematic for some users of alloc_contic_range, e.g: CMA and virtio-mem, where those would fail the call even when those pages lay in ZONE_MOVABLE and are free. We can do better by trying to replace such page. Free hugepages are tricky to handle so as to no userspace application notices disruption, we need to replace the current free hugepage with a new one. In order to do that, a new function called alloc_and_dissolve_huge_page is introduced. This function will first try to get a new fresh hugepage, and if it succeeds, it will replace the old one in the free hugepage pool. All operations are being handled under hugetlb_lock, so no races are possible. The only exception is when page's refcount is 0, but it still has not been flagged as PageHugeFreed. E.g, below scenario: CPU0 CPU1 __free_huge_page() isolate_or_dissolve_huge_page PageHuge() == T alloc_and_dissolve_huge_page alloc_fresh_huge_page() spin_lock(hugetlb_lock) // PageHuge() && !PageHugeFreed && // !PageCount() spin_unlock(hugetlb_lock) spin_lock(hugetlb_lock) 1) update_and_free_page PageHuge() == F __free_pages() 2) enqueue_huge_page SetPageHugeFreed() spin_unlock(&hugetlb_lock) spin_lock(hugetlb_lock) 1) PageHuge() == F (freed by case#1 from CPU0) 2) PageHuge() == T PageHugeFreed() == T - proceed with replacing the page In the case above we retry as the window race is quite small and we have high chances to succeed next time. With regard to the allocation, we restrict it to the node the page belongs to with __GFP_THISNODE, meaning we do not fallback on other node's zones. Note that gigantic hugetlb pages are fenced off since there is a cyclic dependency between them and alloc_contig_range. Signed-off-by: Oscar Salvador Reviewed-by: Mike Kravetz Acked-by: Michal Hocko --- include/linux/hugetlb.h | 6 +++ mm/compaction.c | 33 ++++++++++++++- mm/hugetlb.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 145 insertions(+), 3 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index cccd1aab69dd..bcff86ca616f 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -583,6 +583,7 @@ struct huge_bootmem_page { struct hstate *hstate; }; +int isolate_or_dissolve_huge_page(struct page *page); struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, @@ -865,6 +866,11 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; +static inline int isolate_or_dissolve_huge_page(struct page *page) +{ + return -ENOMEM; +} + static inline struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve) diff --git a/mm/compaction.c b/mm/compaction.c index 5769753a8f60..9f253fc3b4f9 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -810,6 +810,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, bool skip_on_failure = false; unsigned long next_skip_pfn = 0; bool skip_updated = false; + bool fatal_error = false; + int ret = 0; cc->migrate_pfn = low_pfn; @@ -907,6 +909,32 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, valid_page = page; } + if (PageHuge(page) && cc->alloc_contig) { + ret = isolate_or_dissolve_huge_page(page); + + /* + * Fail isolation in case isolate_or_dissolve_huge_page + * reports an error. In case of -ENOMEM, abort right away. + */ + if (ret < 0) { + /* + * Do not report -EBUSY down the chain. + */ + if (ret == -ENOMEM) + fatal_error = true; + else + ret = 0; + goto isolate_fail; + } + + /* + * Ok, the hugepage was dissolved. Now these pages are + * Buddy and cannot be re-allocated because they are + * isolated. Fall-through as the check below handles + * Buddy pages. + */ + } + /* * Skip if free. We read page order here without zone lock * which is generally unsafe, but the race window is small and @@ -1092,6 +1120,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, */ next_skip_pfn += 1UL << cc->order; } + + if (fatal_error) + break; } /* @@ -1135,7 +1166,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, cc->migrate_pfn = low_pfn; - return 0; + return ret; } /** diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5b1ab1f427c5..3194c1bd9e32 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1035,13 +1035,18 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg) return false; } +static void __enqueue_huge_page(struct list_head *list, struct page *page) +{ + list_move(&page->lru, list); + SetHPageFreed(page); +} + static void enqueue_huge_page(struct hstate *h, struct page *page) { int nid = page_to_nid(page); - list_move(&page->lru, &h->hugepage_freelists[nid]); + __enqueue_huge_page(&h->hugepage_freelists[nid], page); h->free_huge_pages++; h->free_huge_pages_node[nid]++; - SetHPageFreed(page); } static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) @@ -2245,6 +2250,106 @@ static void restore_reserve_on_error(struct hstate *h, } } +/* + * alloc_and_dissolve_huge_page - Allocate a new page and dissolve the old one + * @h: struct hstate old page belongs to + * @old_page: Old page to dissolve + * Returns 0 on success, otherwise negated error. + */ + +static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page) +{ + gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; + int nid = page_to_nid(old_page); + struct page *new_page; + int ret = 0; + + /* + * Before dissolving the page, we need to allocate a new one, + * so the pool remains stable. + */ + new_page = alloc_fresh_huge_page(h, gfp_mask, nid, NULL, NULL); + if (!new_page) + return -ENOMEM; + + /* + * Pages got from Buddy are self-refcounted, but free hugepages + * need to have a refcount of 0. + */ + page_ref_dec(new_page); +retry: + spin_lock(&hugetlb_lock); + if (!PageHuge(old_page)) { + /* + * Freed from under us. Drop new_page too. + */ + update_and_free_page(h, new_page); + goto unlock; + } else if (page_count(old_page)) { + /* + * Someone has grabbed the page, fail for now. + */ + ret = -EBUSY; + update_and_free_page(h, new_page); + goto unlock; + } else if (!HPageFreed(old_page)) { + /* + * Page's refcount is 0 but it has not been enqueued in the + * freelist yet. Race window is small, so we can succed here if + * we retry. + */ + spin_unlock(&hugetlb_lock); + cond_resched(); + goto retry; + } else { + /* + * Ok, old_page is still a genuine free hugepage. Replace it + * with the new one. + */ + list_del(&old_page->lru); + update_and_free_page(h, old_page); + /* + * h->free_huge_pages{_node} counters do not need to be updated. + */ + __enqueue_huge_page(&h->hugepage_freelists[nid], new_page); + } +unlock: + spin_unlock(&hugetlb_lock); + + return ret; +} + +int isolate_or_dissolve_huge_page(struct page *page) +{ + struct hstate *h; + struct page *head; + + /* + * The page might have been dissolved from under our feet, so make sure + * to carefully check the state under the lock. + * Return success when racing as if we dissolved the page ourselves. + */ + spin_lock(&hugetlb_lock); + if (PageHuge(page)) { + head = compound_head(page); + h = page_hstate(head); + } else { + spin_unlock(&hugetlb_lock); + return 0; + } + spin_unlock(&hugetlb_lock); + + /* + * Fence off gigantic pages as there is a cyclic dependency between + * alloc_contig_range and them. Return -ENOME as this has the effect + * of bailing out right away without further retrying. + */ + if (hstate_is_gigantic(h)) + return -ENOMEM; + + return alloc_and_dissolve_huge_page(h, head); +} + struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve) { From patchwork Wed Mar 17 11:12:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 12145559 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D90C2C433E9 for ; Wed, 17 Mar 2021 11:13:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 76CF164F6A for ; Wed, 17 Mar 2021 11:13:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 76CF164F6A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 30D3F6B0073; Wed, 17 Mar 2021 07:13:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F76B6B0074; Wed, 17 Mar 2021 07:13:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 045C36B0075; Wed, 17 Mar 2021 07:13:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0055.hostedemail.com [216.40.44.55]) by kanga.kvack.org (Postfix) with ESMTP id DC5396B0073 for ; Wed, 17 Mar 2021 07:13:07 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9513852B7 for ; Wed, 17 Mar 2021 11:13:07 +0000 (UTC) X-FDA: 77929104414.26.2709820 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf04.hostedemail.com (Postfix) with ESMTP id 072013C4 for ; Wed, 17 Mar 2021 11:13:06 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id E8C72AE05; Wed, 17 Mar 2021 11:13:05 +0000 (UTC) From: Oscar Salvador To: Andrew Morton Cc: Vlastimil Babka , David Hildenbrand , Michal Hocko , Muchun Song , Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v5 4/5] mm: Make alloc_contig_range handle in-use hugetlb pages Date: Wed, 17 Mar 2021 12:12:50 +0100 Message-Id: <20210317111251.17808-5-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20210317111251.17808-1-osalvador@suse.de> References: <20210317111251.17808-1-osalvador@suse.de> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 072013C4 X-Stat-Signature: 3h8wshxendnpztddb6qg9ofe6t8fdqbd Received-SPF: none (suse.de>: No applicable sender policy available) receiver=imf04; identity=mailfrom; envelope-from=""; helo=mx2.suse.de; client-ip=195.135.220.15 X-HE-DKIM-Result: none/none X-HE-Tag: 1615979586-524021 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: alloc_contig_range() will fail if it finds a HugeTLB page within the range, without a chance to handle them. Since HugeTLB pages can be migrated as any LRU or Movable page, it does not make sense to bail out without trying. Enable the interface to recognize in-use HugeTLB pages so we can migrate them, and have much better chances to succeed the call. Signed-off-by: Oscar Salvador Reviewed-by: Mike Kravetz Acked-by: Michal Hocko --- include/linux/hugetlb.h | 5 +++-- mm/compaction.c | 12 +++++++++++- mm/hugetlb.c | 22 +++++++++++++++++++--- mm/vmscan.c | 5 +++-- 4 files changed, 36 insertions(+), 8 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index bcff86ca616f..a37b4ce86e58 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -583,7 +583,7 @@ struct huge_bootmem_page { struct hstate *hstate; }; -int isolate_or_dissolve_huge_page(struct page *page); +int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, @@ -866,7 +866,8 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; -static inline int isolate_or_dissolve_huge_page(struct page *page) +static inline int isolate_or_dissolve_huge_page(struct page *page, + struct list_head *list) { return -ENOMEM; } diff --git a/mm/compaction.c b/mm/compaction.c index 9f253fc3b4f9..6e47855fd154 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -910,7 +910,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, } if (PageHuge(page) && cc->alloc_contig) { - ret = isolate_or_dissolve_huge_page(page); + ret = isolate_or_dissolve_huge_page(page, &cc->migratepages); /* * Fail isolation in case isolate_or_dissolve_huge_page @@ -927,6 +927,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, goto isolate_fail; } + if (PageHuge(page)) { + /* + * Hugepage was successfully isolated and placed + * on the cc->migratepages list. + */ + low_pfn += compound_nr(page) - 1; + goto isolate_success_no_list; + } + /* * Ok, the hugepage was dissolved. Now these pages are * Buddy and cannot be re-allocated because they are @@ -1068,6 +1077,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, isolate_success: list_add(&page->lru, &cc->migratepages); +isolate_success_no_list: cc->nr_migratepages += compound_nr(page); nr_isolated += compound_nr(page); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3194c1bd9e32..11e86434d8bd 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2287,7 +2287,9 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page) goto unlock; } else if (page_count(old_page)) { /* - * Someone has grabbed the page, fail for now. + * Someone has grabbed the page, return -EBUSY so we give + * isolate_or_dissolve_huge_page a chance to handle an in-use + * page. */ ret = -EBUSY; update_and_free_page(h, new_page); @@ -2319,10 +2321,12 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page) return ret; } -int isolate_or_dissolve_huge_page(struct page *page) +int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) { struct hstate *h; struct page *head; + bool try_again = true; + int ret = -EBUSY; /* * The page might have been dissolved from under our feet, so make sure @@ -2347,7 +2351,19 @@ int isolate_or_dissolve_huge_page(struct page *page) if (hstate_is_gigantic(h)) return -ENOMEM; - return alloc_and_dissolve_huge_page(h, head); +retry: + if (page_count(head) && isolate_huge_page(head, list)) { + ret = 0; + } else if (!page_count(head)) { + ret = alloc_and_dissolve_huge_page(h, head); + + if (ret == -EBUSY && try_again) { + try_again = false; + goto retry; + } + } + + return ret; } struct page *alloc_huge_page(struct vm_area_struct *vma, diff --git a/mm/vmscan.c b/mm/vmscan.c index 562e87cbd7a1..42aaef30633e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1507,8 +1507,9 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, LIST_HEAD(clean_pages); list_for_each_entry_safe(page, next, page_list, lru) { - if (page_is_file_lru(page) && !PageDirty(page) && - !__PageMovable(page) && !PageUnevictable(page)) { + if (!PageHuge(page) && page_is_file_lru(page) && + !PageDirty(page) && !__PageMovable(page) && + !PageUnevictable(page)) { ClearPageActive(page); list_move(&page->lru, &clean_pages); } From patchwork Wed Mar 17 11:12:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 12145561 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2197AC433E0 for ; Wed, 17 Mar 2021 11:13:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B76F864F69 for ; Wed, 17 Mar 2021 11:13:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B76F864F69 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B1C586B0074; Wed, 17 Mar 2021 07:13:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AF7638D0007; Wed, 17 Mar 2021 07:13:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9493C6B0078; Wed, 17 Mar 2021 07:13:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0242.hostedemail.com [216.40.44.242]) by kanga.kvack.org (Postfix) with ESMTP id 68C2F6B0074 for ; Wed, 17 Mar 2021 07:13:08 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 307308249980 for ; Wed, 17 Mar 2021 11:13:08 +0000 (UTC) X-FDA: 77929104456.30.7BBDE94 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf03.hostedemail.com (Postfix) with ESMTP id AEFBFC0001F7 for ; Wed, 17 Mar 2021 11:13:07 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id AA746AC1F; Wed, 17 Mar 2021 11:13:06 +0000 (UTC) From: Oscar Salvador To: Andrew Morton Cc: Vlastimil Babka , David Hildenbrand , Michal Hocko , Muchun Song , Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v5 5/5] mm,page_alloc: Drop unnecessary checks from pfn_range_valid_contig Date: Wed, 17 Mar 2021 12:12:51 +0100 Message-Id: <20210317111251.17808-6-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20210317111251.17808-1-osalvador@suse.de> References: <20210317111251.17808-1-osalvador@suse.de> X-Stat-Signature: shtnsxt7sdy7jxossj3ioxufzo7ph8pa X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: AEFBFC0001F7 Received-SPF: none (suse.de>: No applicable sender policy available) receiver=imf03; identity=mailfrom; envelope-from=""; helo=mx2.suse.de; client-ip=195.135.220.15 X-HE-DKIM-Result: none/none X-HE-Tag: 1615979587-393755 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: pfn_range_valid_contig() bails out when it finds an in-use page or a hugetlb page, among other things. We can drop the in-use page check since __alloc_contig_pages can migrate away those pages, and the hugetlb page check can go too since isolate_migratepages_range is now capable of dealing with hugetlb pages. Either way, those checks are racy so let the end function handle it when the time comes. Signed-off-by: Oscar Salvador Suggested-by: David Hildenbrand Reviewed-by: David Hildenbrand --- mm/page_alloc.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4cb455355f6d..50d73e68b79e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8685,12 +8685,6 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn, if (PageReserved(page)) return false; - - if (page_count(page) > 0) - return false; - - if (PageHuge(page)) - return false; } return true; }