From patchwork Mon Feb 22 13:51:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 12098727 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAA93C433E0 for ; Mon, 22 Feb 2021 13:51:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6220C64E61 for ; Mon, 22 Feb 2021 13:51:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6220C64E61 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F0B906B006C; Mon, 22 Feb 2021 08:51:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EBC2D8D0001; Mon, 22 Feb 2021 08:51:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD7F56B0071; Mon, 22 Feb 2021 08:51:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0242.hostedemail.com [216.40.44.242]) by kanga.kvack.org (Postfix) with ESMTP id C5B7A6B006E for ; Mon, 22 Feb 2021 08:51:43 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 86EEF1803379A for ; Mon, 22 Feb 2021 13:51:43 +0000 (UTC) X-FDA: 77846041686.19.100CB7E Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf07.hostedemail.com (Postfix) with ESMTP id 53906A0009E3 for ; Mon, 22 Feb 2021 13:51:42 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id E5F96AD5C; Mon, 22 Feb 2021 13:51:41 +0000 (UTC) From: Oscar Salvador To: Andrew Morton Cc: Mike Kravetz , David Hildenbrand , Muchun Song , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v3 1/2] mm: Make alloc_contig_range handle free hugetlb pages Date: Mon, 22 Feb 2021 14:51:36 +0100 Message-Id: <20210222135137.25717-2-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20210222135137.25717-1-osalvador@suse.de> References: <20210222135137.25717-1-osalvador@suse.de> X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 53906A0009E3 X-Stat-Signature: xooqkbhtrtg74k1gdd9g196t1qc4oy88 Received-SPF: none (suse.de>: No applicable sender policy available) receiver=imf07; identity=mailfrom; envelope-from=""; helo=mx2.suse.de; client-ip=195.135.220.15 X-HE-DKIM-Result: none/none X-HE-Tag: 1614001902-807357 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: alloc_contig_range will fail if it ever sees a HugeTLB page within the range we are trying to allocate, even when that page is free and can be easily reallocated. This has proved to be problematic for some users of alloc_contic_range, e.g: CMA and virtio-mem, where those would fail the call even when those pages lay in ZONE_MOVABLE and are free. We can do better by trying to replace such page. Free hugepages are tricky to handle so as to no userspace application notices disruption, we need to replace the current free hugepage with a new one. In order to do that, a new function called alloc_and_dissolve_huge_page is introduced. This function will first try to get a new fresh hugepage, and if it succeeds, it will replace the old one in the free hugepage pool. All operations are being handled under hugetlb_lock, so no races are possible. The only exception is when page's refcount is 0, but it still has not been flagged as PageHugeFreed. In this case we retry as the window race is quite small and we have high chances to succeed next time. With regard to the allocation, we restrict it to the node the page belongs to with __GFP_THISNODE, meaning we do not fallback on other node's zones. Note that gigantic hugetlb pages are fenced off since there is a cyclic dependency between them and alloc_contig_range. Signed-off-by: Oscar Salvador Reviewed-by: Mike Kravetz Acked-by: Michal Hocko --- include/linux/hugetlb.h | 6 +++ mm/compaction.c | 12 ++++++ mm/hugetlb.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 127 insertions(+), 2 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index b5807f23caf8..72352d718829 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -505,6 +505,7 @@ struct huge_bootmem_page { struct hstate *hstate; }; +bool isolate_or_dissolve_huge_page(struct page *page); struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, @@ -775,6 +776,11 @@ void set_page_huge_active(struct page *page); #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; +static inline bool isolate_or_dissolve_huge_page(struct page *page) +{ + return false; +} + static inline struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve) diff --git a/mm/compaction.c b/mm/compaction.c index 190ccdaa6c19..d52506ed9db7 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -905,6 +905,18 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, valid_page = page; } + if (PageHuge(page) && cc->alloc_contig) { + if (!isolate_or_dissolve_huge_page(page)) + goto isolate_fail; + + /* + * Ok, the hugepage was dissolved. Now these pages are + * Buddy and cannot be re-allocated because they are + * isolated. Fall-through as the check below handles + * Buddy pages. + */ + } + /* * Skip if free. We read page order here without zone lock * which is generally unsafe, but the race window is small and diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4bdb58ab14cb..56eba64a1d33 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1037,13 +1037,18 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg) return false; } +static void __enqueue_huge_page(struct list_head *list, struct page *page) +{ + list_move(&page->lru, list); + SetPageHugeFreed(page); +} + static void enqueue_huge_page(struct hstate *h, struct page *page) { int nid = page_to_nid(page); - list_move(&page->lru, &h->hugepage_freelists[nid]); + __enqueue_huge_page(&h->hugepage_freelists[nid], page); h->free_huge_pages++; h->free_huge_pages_node[nid]++; - SetPageHugeFreed(page); } static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) @@ -2294,6 +2299,108 @@ static void restore_reserve_on_error(struct hstate *h, } } +/* + * alloc_and_dissolve_huge_page - Allocate a new page and dissolve the old one + * @h: struct hstate old page belongs to + * @old_page: Old page to dissolve + * Returns 0 on success, otherwise negated error. + */ + +static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page) +{ + gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; + int nid = page_to_nid(old_page); + struct page *new_page; + int ret = 0; + + /* + * Before dissolving the page, we need to allocate a new one, + * so the pool remains stable. + */ + new_page = alloc_fresh_huge_page(h, gfp_mask, nid, NULL, NULL); + if (!new_page) + return -ENOMEM; + + /* + * Pages got from Buddy are self-refcounted, but free hugepages + * need to have a refcount of 0. + */ + page_ref_dec(new_page); +retry: + spin_lock(&hugetlb_lock); + if (!PageHuge(old_page)) { + /* + * Freed from under us. Drop new_page too. + */ + update_and_free_page(h, new_page); + goto unlock; + } else if (page_count(old_page)) { + /* + * Someone has grabbed the page, fail for now. + */ + ret = -EBUSY; + update_and_free_page(h, new_page); + goto unlock; + } else if (!PageHugeFreed(old_page)) { + /* + * Page's refcount is 0 but it has not been enqueued in the + * freelist yet. Race window is small, so we can succed here if + * we retry. + */ + spin_unlock(&hugetlb_lock); + cond_resched(); + goto retry; + } else { + /* + * Ok, old_page is still a genuine free hugepage. Replace it + * with the new one. + */ + list_del(&old_page->lru); + update_and_free_page(h, old_page); + /* + * h->free_huge_pages{_node} counters do not need to be updated. + */ + __enqueue_huge_page(&h->hugepage_freelists[nid], new_page); + } +unlock: + spin_unlock(&hugetlb_lock); + + return ret; +} + +bool isolate_or_dissolve_huge_page(struct page *page) +{ + struct hstate *h = NULL; + struct page *head; + bool ret = false; + + spin_lock(&hugetlb_lock); + if (PageHuge(page)) { + head = compound_head(page); + h = page_hstate(head); + } + spin_unlock(&hugetlb_lock); + + /* + * The page might have been dissolved from under our feet. + * If that is the case, return success as if we dissolved it ourselves. + */ + if (!h) + return true; + + /* + * Fence off gigantic pages as there is a cyclic dependency + * between alloc_contig_range and them. + */ + if (hstate_is_gigantic(h)) + return ret; + + if (!page_count(head) && alloc_and_dissolve_huge_page(h, head)) + ret = true; + + return ret; +} + struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve) { From patchwork Mon Feb 22 13:51:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 12098729 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 297FCC433DB for ; Mon, 22 Feb 2021 13:51:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A144A64E61 for ; Mon, 22 Feb 2021 13:51:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A144A64E61 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0DEAA6B006E; Mon, 22 Feb 2021 08:51:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 01B018D0002; Mon, 22 Feb 2021 08:51:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1F598D0001; Mon, 22 Feb 2021 08:51:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0214.hostedemail.com [216.40.44.214]) by kanga.kvack.org (Postfix) with ESMTP id C58BC6B006E for ; Mon, 22 Feb 2021 08:51:44 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 848F4812D for ; Mon, 22 Feb 2021 13:51:44 +0000 (UTC) X-FDA: 77846041728.05.9BFE7BB Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf15.hostedemail.com (Postfix) with ESMTP id EAD5DA0000FF for ; Mon, 22 Feb 2021 13:51:42 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 95312AD2B; Mon, 22 Feb 2021 13:51:42 +0000 (UTC) From: Oscar Salvador To: Andrew Morton Cc: Mike Kravetz , David Hildenbrand , Muchun Song , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v3 2/2] mm: Make alloc_contig_range handle in-use hugetlb pages Date: Mon, 22 Feb 2021 14:51:37 +0100 Message-Id: <20210222135137.25717-3-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20210222135137.25717-1-osalvador@suse.de> References: <20210222135137.25717-1-osalvador@suse.de> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: EAD5DA0000FF X-Stat-Signature: eiamx9fbo8qhe5cpi38tkkcq9gzxyybm Received-SPF: none (suse.de>: No applicable sender policy available) receiver=imf15; identity=mailfrom; envelope-from=""; helo=mx2.suse.de; client-ip=195.135.220.15 X-HE-DKIM-Result: none/none X-HE-Tag: 1614001902-727862 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: alloc_contig_range() will fail if it finds a HugeTLB page within the range, without a chance to handle them. Since HugeTLB pages can be migrated as any LRU or Movable page, it does not make sense to bail out without trying. Enable the interface to recognize in-use HugeTLB pages so we can migrate them, and have much better chances to succeed the call. Signed-off-by: Oscar Salvador Reviewed-by: Mike Kravetz --- include/linux/hugetlb.h | 5 +++-- mm/compaction.c | 12 +++++++++++- mm/hugetlb.c | 21 +++++++++++++++++---- mm/vmscan.c | 5 +++-- 4 files changed, 34 insertions(+), 9 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 72352d718829..8c17d0dbc87c 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -505,7 +505,7 @@ struct huge_bootmem_page { struct hstate *hstate; }; -bool isolate_or_dissolve_huge_page(struct page *page); +bool isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, @@ -776,7 +776,8 @@ void set_page_huge_active(struct page *page); #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; -static inline bool isolate_or_dissolve_huge_page(struct page *page) +static inline bool isolate_or_dissolve_huge_page(struct page *page, + struct list_head *list) { return false; } diff --git a/mm/compaction.c b/mm/compaction.c index d52506ed9db7..6d9169e71d61 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -906,9 +906,18 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, } if (PageHuge(page) && cc->alloc_contig) { - if (!isolate_or_dissolve_huge_page(page)) + if (!isolate_or_dissolve_huge_page(page, &cc->migratepages)) goto isolate_fail; + if (PageHuge(page)) { + /* + * Hugepage was successfully isolated and placed + * on the cc->migratepages list. + */ + low_pfn += compound_nr(page) - 1; + goto isolate_success_no_list; + } + /* * Ok, the hugepage was dissolved. Now these pages are * Buddy and cannot be re-allocated because they are @@ -1053,6 +1062,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, isolate_success: list_add(&page->lru, &cc->migratepages); +isolate_success_no_list: cc->nr_migratepages += compound_nr(page); nr_isolated += compound_nr(page); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 56eba64a1d33..95dd54cd53c0 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2336,7 +2336,9 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page) goto unlock; } else if (page_count(old_page)) { /* - * Someone has grabbed the page, fail for now. + * Someone has grabbed the page, return -EBUSY so we give + * isolate_or_dissolve_huge_page a chance to handle an in-use + * page. */ ret = -EBUSY; update_and_free_page(h, new_page); @@ -2368,11 +2370,12 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page) return ret; } -bool isolate_or_dissolve_huge_page(struct page *page) +bool isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) { struct hstate *h = NULL; struct page *head; bool ret = false; + bool try_again = true; spin_lock(&hugetlb_lock); if (PageHuge(page)) { @@ -2394,9 +2397,19 @@ bool isolate_or_dissolve_huge_page(struct page *page) */ if (hstate_is_gigantic(h)) return ret; - - if (!page_count(head) && alloc_and_dissolve_huge_page(h, head)) +retry: + if (page_count(head) && isolate_huge_page(head, list)) { ret = true; + } else if (!page_count(head)) { + int err = alloc_and_dissolve_huge_page(h, head); + + if (!err) { + ret = true; + } else if (err == -EBUSY && try_again) { + try_again = false; + goto retry; + } + } return ret; } diff --git a/mm/vmscan.c b/mm/vmscan.c index b1b574ad199d..0803adca4469 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1506,8 +1506,9 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, LIST_HEAD(clean_pages); list_for_each_entry_safe(page, next, page_list, lru) { - if (page_is_file_lru(page) && !PageDirty(page) && - !__PageMovable(page) && !PageUnevictable(page)) { + if (!PageHuge(page) && page_is_file_lru(page) && + !PageDirty(page) && !__PageMovable(page) && + !PageUnevictable(page)) { ClearPageActive(page); list_move(&page->lru, &clean_pages); }