From patchwork Thu Jun 20 00:59:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ge Yang X-Patchwork-Id: 13704784 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4EF77C27C53 for ; Thu, 20 Jun 2024 01:00:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC4D06B009E; Wed, 19 Jun 2024 21:00:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BAF6B6B0095; Wed, 19 Jun 2024 21:00:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9DBAA6B0088; Wed, 19 Jun 2024 21:00:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7C1368D0091 for ; Wed, 19 Jun 2024 21:00:11 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 1D4B0A15D1 for ; Thu, 20 Jun 2024 01:00:11 +0000 (UTC) X-FDA: 82249460622.16.2695128 Received: from m16.mail.126.com (m16.mail.126.com [220.197.31.7]) by imf08.hostedemail.com (Postfix) with ESMTP id 17FBC160021 for ; Thu, 20 Jun 2024 01:00:07 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=Evr4HUDJ; dmarc=pass (policy=none) header.from=126.com; spf=pass (imf08.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.7 as permitted sender) smtp.mailfrom=yangge1116@126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718845200; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references:dkim-signature; bh=WW4Jez97mtesbicc+37OFwctvMHDtr7/KR4+TtH6+9g=; b=i/tLONfNr2uw9mfAAKhw54CJ8P6LGUjX+YKhJL0kC0RIi1m0y0vPyCN28fODujn0nmwpKz OTue+3KsWH0Jw9lynyzmcVL2BHYuGZ2fuXlCB5lgFbw6eKQ2SXQeyzQ+LTOJty+Gm4mPrC lC/OylVJ88Kz9CTvdPPpm1nC7Bi1oCI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718845200; a=rsa-sha256; cv=none; b=ZQw6afLZvpjV+EKMEbQCDFrDF1gRcGSp8S6jDS+NlVvV21uINQmaR2VeOqz8m8tkOxG42u Je2WUGGi+409yy75L/8VMoJfwsdcuR8DJUZucmjPa9AIZqOVw/ht/5mV2Oog5pmmf+NRJU eeX/n/ixAV7eO9dwMIWnd+XVPELryJQ= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=Evr4HUDJ; dmarc=pass (policy=none) header.from=126.com; spf=pass (imf08.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.7 as permitted sender) smtp.mailfrom=yangge1116@126.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=From:Subject:Date:Message-Id; bh=WW4Jez97mtesbicc+3 7OFwctvMHDtr7/KR4+TtH6+9g=; b=Evr4HUDJ2s/qE45eFJx7vru73EiMvFzUBC BvG6v9swuXJFTq6yPYE+2W7i79I0jouzFWrRJ0H9iLMl+Vdk8MwP9J4QMAuk3UNp FLMIbDRCNw+K+SwSGxgMMND5yBxFZyTH9VIrIhq1lyA9l1VPd5rwz0+FrLcETaMT cbatK7kxs= Received: from hg-OptiPlex-7040.hygon.cn (unknown [118.242.3.34]) by gzga-smtp-mta-g0-1 (Coremail) with SMTP id _____wDn758Nf3NmX9+2AA--.51251S2; Thu, 20 Jun 2024 08:59:59 +0800 (CST) From: yangge1116@126.com To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, 21cnbao@gmail.com, baolin.wang@linux.alibaba.com, mgorman@techsingularity.net, liuzixing@hygon.cn, yangge , stable@vger.kernel.org Subject: [PATCH V2] mm/page_alloc: Separate THP PCP into movable and non-movable categories Date: Thu, 20 Jun 2024 08:59:50 +0800 Message-Id: <1718845190-4456-1-git-send-email-yangge1116@126.com> X-Mailer: git-send-email 2.7.4 X-CM-TRANSID: _____wDn758Nf3NmX9+2AA--.51251S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxAFWUtF17Xw18JF1UuFWDXFb_yoWrAFy3pF WxJr4ayayjqry3Ar1xA3Wqkr1rCwnxGFsrCr1xury8ZwsxJFyS9a4UK3WqvF95ArW7AF48 Xr9rt34fCF4DZ3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07jbPEhUUUUU= X-Originating-IP: [118.242.3.34] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiGBQEG2VLb4BrvgABsG X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 17FBC160021 X-Stat-Signature: 7jda4hp531g1h14n5h3axqf9cat7s5qz X-Rspam-User: X-HE-Tag: 1718845207-563085 X-HE-Meta: U2FsdGVkX1/yoRklZbbOKenq85zFdk8BrUYRJsE1VT1KLr+y5OIOTSA1d8YbtrGTOwG9I4foFl8wwxv29lvY0TtEVgxqZI7A1GIUp0+YDyw7y3HtlMKcFap5wzjN60gqY0TRTRV7L6o9GhPDEJqC3elqGbCzX5L7gWEb/vSN+Lm3ayY+m3N0TSlUGmkH3XlRNKuwlJ91bP7YW+R/Ea2ub92b4Q1vSSseZZKxQ+PjuAM6KvZxmC8g96i8nKfUebpPF7vSu7jZsWxRX41KnYyu/9dtcBgJizWHzeITzNH0U71QdHWYagD+p8Kfxjyc6soL/5gaxKT3RjKvAF7xaygu/0rpqXVC31FJMW0yJAlFDrGUVmAHNobQGaztSp7FssATYHv5eZcxy9N93H29Tc/d54im9Ksgz+6VRXZO9L2RiWrtCQt7gy48p5k7vp/95Lxvlged25oTWFj2TDEv55hnqg31+I2wLGVKfOP7b91fihAEAllBQ5zqAQGGW1co4F2UTJFFJ22G+FOs+M3uQFtlz4LakMfE0WKZsyewEKiguHYuVbv3OsQRpAO4orPIIrux9kycX9v8LQkyn+ppEzayNPv69Nkk73/Jl7aq3ZSD51I0hdMMDhI2WaLhw89MixjdnDtWB/UykOwNcO41tC8zzKh3Pl6aKKx9pAqfS0w2VWqVJ4V3tySXoDOgAD5EGl8J1TmC0BVmOfXB4/FN+sNlmMIYlJBm5uxFCSJJPMWYntXaiFbAhh+X8az4dAxtlTUkGHbsS147o9IO1svLQadQvj2a/mIfh3MYfNy2Ln9pzyP+vx9JFjkQqAOxI28fUbuplvUe6Xjm4+aHd/DY7HchuF/3gDBlKxxBKwJ6DoPymNVRMqVl9Oj0FRPdOhvBvdNT62UOar9Fl4WN5LNaZ06wylVL7z2QUHrEEFPCZhMjebRTr4UxYKIAKU9Rnpx1wXzWTfEs1PoLebUxITxviax ttx+ptQC lqNMra9uWEAdmabg6yoGmV5JnrjvFKYLcIS5TkkYKmGO+Xqz3/Dk5QsCyG+6sC0x+XNCoKP2+ZWqe5DKeOpCftDDJh7wQT8i06PXVTeFfSFFESc0k+A/6qtmDWYLtgkHC+NLo4UgyrUtldhBMKlDN9ZW8ivy6qAm4h2y/aTfw1zU3Gq2V7dD3lvT7xx6irm+56jfP0apaO5EfXL9WUHY0BaKh4s6P8neIj0p/tH0O1dBYHDMQ9CVusYY5Hf0IYnQZy0zKAQzwhlV6qvbJzQuFmoywPxCWDWt7NyMiLSiN6C+P/gyF2wLkUGHxoAcnHFZPTG4zhggQoCKqgi6LnO97BpuMiGawzssmrHrXNsoi5amLiGGE6NO/5WEmbY5FcNSIkDULb1FUJ7jAt6jdVXeMGbffp02WCLFEPr2YyNGJd/87TOhKoeNoIqj1MpDonpU846tEgGtclBszZGBwroH22BQWplmGn+gE8jBp X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: yangge Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations") no longer differentiates the migration type of pages in THP-sized PCP list, it's possible that non-movable allocation requests may get a CMA page from the list, in some cases, it's not acceptable. If a large number of CMA memory are configured in system (for example, the CMA memory accounts for 50% of the system memory), starting a virtual machine with device passthrough will get stuck. During starting the virtual machine, it will call pin_user_pages_remote(..., FOLL_LONGTERM, ...) to pin memory. Normally if a page is present and in CMA area, pin_user_pages_remote() will migrate the page from CMA area to non-CMA area because of FOLL_LONGTERM flag. But if non-movable allocation requests return CMA memory, migrate_longterm_unpinnable_pages() will migrate a CMA page to another CMA page, which will fail to pass the check in check_and_migrate_movable_pages() and cause migration endless. Call trace: pin_user_pages_remote --__gup_longterm_locked // endless loops in this function ----_get_user_pages_locked ----check_and_migrate_movable_pages ------migrate_longterm_unpinnable_pages --------alloc_migration_target This problem will also have a negative impact on CMA itself. For example, when CMA is borrowed by THP, and we need to reclaim it through cma_alloc() or dma_alloc_coherent(), we must move those pages out to ensure CMA's users can retrieve that contigous memory. Currently, CMA's memory is occupied by non-movable pages, meaning we can't relocate them. As a result, cma_alloc() is more likely to fail. To fix the problem above, we add one PCP list for THP, which will not introduce a new cacheline for struct per_cpu_pages. THP will have 2 PCP lists, one PCP list is used by MOVABLE allocation, and the other PCP list is used by UNMOVABLE allocation. MOVABLE allocation contains GPF_MOVABLE, and UNMOVABLE allocation contains GFP_UNMOVABLE and GFP_RECLAIMABLE. Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations") Cc: Signed-off-by: yangge --- V2: - Change the commit title - Add Cc to stable include/linux/mmzone.h | 9 ++++----- mm/page_alloc.c | 9 +++++++-- 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b7546dd..cb7f265 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -656,13 +656,12 @@ enum zone_watermarks { }; /* - * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. One additional list - * for THP which will usually be GFP_MOVABLE. Even if it is another type, - * it should not contribute to serious fragmentation causing THP allocation - * failures. + * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. Two additional lists + * are added for THP. One PCP list is used by GPF_MOVABLE, and the other PCP list + * is used by GFP_UNMOVABLE and GFP_RECLAIMABLE. */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE -#define NR_PCP_THP 1 +#define NR_PCP_THP 2 #else #define NR_PCP_THP 0 #endif diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8f416a0..0a837e6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -504,10 +504,15 @@ static void bad_page(struct page *page, const char *reason) static inline unsigned int order_to_pindex(int migratetype, int order) { + bool __maybe_unused movable; + #ifdef CONFIG_TRANSPARENT_HUGEPAGE if (order > PAGE_ALLOC_COSTLY_ORDER) { VM_BUG_ON(order != HPAGE_PMD_ORDER); - return NR_LOWORDER_PCP_LISTS; + + movable = migratetype == MIGRATE_MOVABLE; + + return NR_LOWORDER_PCP_LISTS + movable; } #else VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER); @@ -521,7 +526,7 @@ static inline int pindex_to_order(unsigned int pindex) int order = pindex / MIGRATE_PCPTYPES; #ifdef CONFIG_TRANSPARENT_HUGEPAGE - if (pindex == NR_LOWORDER_PCP_LISTS) + if (pindex >= NR_LOWORDER_PCP_LISTS) order = HPAGE_PMD_ORDER; #else VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);