From patchwork Mon Apr 15 08:12:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kefeng Wang X-Patchwork-Id: 13629606 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 042EBC4345F for ; Mon, 15 Apr 2024 08:12:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 846CC6B0096; Mon, 15 Apr 2024 04:12:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7CCF46B0093; Mon, 15 Apr 2024 04:12:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C53E6B0095; Mon, 15 Apr 2024 04:12:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4EBAB6B0089 for ; Mon, 15 Apr 2024 04:12:49 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 203E91A04C0 for ; Mon, 15 Apr 2024 08:12:49 +0000 (UTC) X-FDA: 82011050058.18.B7F1DA6 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by imf21.hostedemail.com (Postfix) with ESMTP id 9C4641C0005 for ; Mon, 15 Apr 2024 08:12:46 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713168767; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9vCchmtiklmD2PaZzSvllHrKk3Uhitz218v08T0VuNo=; b=U5eMzSPwtNCqcMI4uOjF0XA54ghnfyhqK1DJiOcEyXmr7qD+BFSGpmkTOkj3YRrYIhlrgM WSB/XDM0Lxsm8bS5SKKpmT/XQPrNT7RM7SOEEew6J3oU0xupQPuuAH+ABWvv1EGapsay3r 9pwrc7/kLElEqNwFVHhAwuiutD3gdiE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713168767; a=rsa-sha256; cv=none; b=n2uFEbtrN/Bj6cmpPPadS77uAgACkm5gpd3p4nj0WxBiWLrp4AXWzuusgQKWeqQNCebLM8 wTkytAlKNH5U+vv3jTCs+KIQLbJ/uDeMGdo6OJfOClT2O8jV8r2yNF7wTmydHj2Ddvk8EZ YCwvKqjECmfqO5tLYuebHUQN+I4qPAw= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.252]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4VJ0Fr6nJ7z1R5pf; Mon, 15 Apr 2024 16:09:52 +0800 (CST) Received: from dggpemm100001.china.huawei.com (unknown [7.185.36.93]) by mail.maildlp.com (Postfix) with ESMTPS id 915FE18007D; Mon, 15 Apr 2024 16:12:42 +0800 (CST) Received: from localhost.localdomain (10.175.112.125) by dggpemm100001.china.huawei.com (7.185.36.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Mon, 15 Apr 2024 16:12:41 +0800 From: Kefeng Wang To: Andrew Morton CC: Huang Ying , Mel Gorman , Ryan Roberts , David Hildenbrand , Barry Song , Vlastimil Babka , Zi Yan , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Yang Shi , Yu Zhao , , Kefeng Wang Subject: [PATCH rfc 1/3] mm: prepare more high-order pages to be stored on the per-cpu lists Date: Mon, 15 Apr 2024 16:12:18 +0800 Message-ID: <20240415081220.3246839-2-wangkefeng.wang@huawei.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20240415081220.3246839-1-wangkefeng.wang@huawei.com> References: <20240415081220.3246839-1-wangkefeng.wang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.112.125] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemm100001.china.huawei.com (7.185.36.93) X-Rspamd-Queue-Id: 9C4641C0005 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: insn36gofeqxre84b76ei6ktiqpum3ar X-HE-Tag: 1713168766-382826 X-HE-Meta: U2FsdGVkX1+ftLXlz6dtQNQIjSO8NqktfGTyZLSYzX5d56BXhQMTc5H/98SipFu3T5haB/6RSGAzWuHxjJdnRz1wY1ILs0nYrXF7nQAG6g6AalsnkxAk/Ri6c6B6Xouf/igXWZPdBB4WqJnCh8edZOE102xsXhQjLbtOWGfApgi+FqW5MqwTBVhD6gTBsE8WVPxhxy9+vjnLb9DvF5oUJAKEXmmsBvcgDOMAusAw08nWIN6bev4YYn5xoih7T3IPzXA/8n6YDH6plLlAnKQXoWvdxIjRMYpvPFsbqa1VVdudh4vI6sTt1sr+IZp2Q/IqeuTi3O3bswJ3HZREi0604A2MGYhDgvSIgBVlDGv04+5yqe58J3mOMyuPCrFbbtCdwGk3EB0kLteJ3XVQb3z8nuuelVD5XWXnprF2xs1l/Io2xHj8rYw9lr5w8bqiABtb5E/dX3mqLThS4u5FOEmRg0RZkYK9sd1tL/kJc5gdmyDeESt0atr28PYSM76tPfeuiNYXFrWJ1XmN6sJmLn1pYpD8mIsD4YDR2CvUtKTTccl4jWjlfdG77PvK9uxXunM63kZhTi8Fov+/JUaQZx8LTUsQjMHFqusEVtGdhOvPHtr5bx6b/XUwlTMxx1dKOBtn64A1ynVzqm/z7D1wzpx31G8rHS6M1g+8TpySC/J5pe898CThnmnP6jgCveXZhGaSgyeWA/JCS4k0YK1dHUkPHtLUgGfecCf5idYp71qPl2O6Ssp71LzXvTNKEhFpFwDhu7DW8v/1hUMeL9IpFzv8ijCDtrM06qbBKRtavkOcwxxvY9iUBHqonTPS2HgP0uLhRxosgo7qBeKiUGT2SpjnogFFRZ9zsBsCRKn4TUdrnC0pk0vY3bBMRi42MYv04MSAQRfaIi9TT9i3rvg9KBfXjgNoUL1MfyfcBKJi5c/9xcFofQluSSADYT7sT9NJhEWn8AJCSEwaAHrlv8W7Cgx 6THIXo+T gajgj92C1hg8P3nVtCCUm5RxuiHgIy+m2NBiQ9LvCGEdmJfyoOexcGvNHzAtN+5Vu047DykK54MmzBa5dTo2UIwz4llPGKTWKLK7x67ZrXqwxIuPcTsBg+k31cD4Ssrhp7JKgqIBw0CII1H5gp3b5CPvc/g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Both the file pages and anonymous pages support large folio, high-order pages except HPAGE_PMD_ORDER(PMD_SHIFT - PAGE_SHIFT) will be allocated frequently which will increase the zone lock contention, allow high-order pages on pcp lists could alleviate the big zone lock contention, in order to allows high-orders(PAGE_ALLOC_COSTLY_ORDER, HPAGE_PMD_ORDER) to be stored on the per-cpu lists, similar with PMD_ORDER pages, more lists is added in struct per_cpu_pages (one list each high-order pages), also a new PCP_MAX_ORDER instead of HPAGE_PMD_ORDER is added in mmzone.h. But as commit 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists") pointed, it may not win in all the scenes, so this don't allow higher-order pages to be added to PCP list, the next will add a control to enable or disable it. The struct per_cpu_pages increases in size from 256(4 cache lines) to 320 bytes (5 cache lines) on arm64 with defconfig. Signed-off-by: Kefeng Wang --- include/linux/mmzone.h | 4 +++- mm/page_alloc.c | 10 +++++----- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c11b7cde81ef..c745e2f1a0f2 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -657,11 +657,13 @@ enum zone_watermarks { * failures. */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE -#define NR_PCP_THP 1 +#define PCP_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT) +#define NR_PCP_THP (PCP_MAX_ORDER - PAGE_ALLOC_COSTLY_ORDER) #else #define NR_PCP_THP 0 #endif #define NR_LOWORDER_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER + 1)) +#define HIGHORDER_PCP_LIST_INDEX (NR_LOWORDER_PCP_LISTS - (PAGE_ALLOC_COSTLY_ORDER + 1)) #define NR_PCP_LISTS (NR_LOWORDER_PCP_LISTS + NR_PCP_THP) #define min_wmark_pages(z) (z->_watermark[WMARK_MIN] + z->watermark_boost) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b51becf03d1e..2248afc7b73a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -506,8 +506,8 @@ static inline unsigned int order_to_pindex(int migratetype, int order) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE if (order > PAGE_ALLOC_COSTLY_ORDER) { - VM_BUG_ON(order != HPAGE_PMD_ORDER); - return NR_LOWORDER_PCP_LISTS; + VM_BUG_ON(order > PCP_MAX_ORDER); + return order + HIGHORDER_PCP_LIST_INDEX; } #else VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER); @@ -521,8 +521,8 @@ static inline int pindex_to_order(unsigned int pindex) int order = pindex / MIGRATE_PCPTYPES; #ifdef CONFIG_TRANSPARENT_HUGEPAGE - if (pindex == NR_LOWORDER_PCP_LISTS) - order = HPAGE_PMD_ORDER; + if (pindex >= NR_LOWORDER_PCP_LISTS) + order = pindex - HIGHORDER_PCP_LIST_INDEX; #else VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER); #endif @@ -535,7 +535,7 @@ static inline bool pcp_allowed_order(unsigned int order) if (order <= PAGE_ALLOC_COSTLY_ORDER) return true; #ifdef CONFIG_TRANSPARENT_HUGEPAGE - if (order == HPAGE_PMD_ORDER) + if (order == PCP_MAX_ORDER) return true; #endif return false; From patchwork Mon Apr 15 08:12:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kefeng Wang X-Patchwork-Id: 13629607 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A052C00A94 for ; Mon, 15 Apr 2024 08:12:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB4646B0089; Mon, 15 Apr 2024 04:12:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A41966B0092; Mon, 15 Apr 2024 04:12:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 841696B0089; Mon, 15 Apr 2024 04:12:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 650DE6B0092 for ; Mon, 15 Apr 2024 04:12:49 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 19640140498 for ; Mon, 15 Apr 2024 08:12:49 +0000 (UTC) X-FDA: 82011050058.22.D5FE18D Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf29.hostedemail.com (Postfix) with ESMTP id 7F7CF120020 for ; Mon, 15 Apr 2024 08:12:46 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713168767; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WHw3DPwl+hQH9EDbLagjIt6TfjwKQQU3f2ijGBgPZQ8=; b=iRXRum0SrIX2Frw0CeBshc2sg3pmwhTftkyTOjI/Ar1Tv1YVSzgUBOn1AScKRl70EKe3Zy ylJj6lRUgTKvRxZwF3nAedW5XaDCnOkwpE4yAIMpgUQz89vJhj/eE2NBZRA1Sbq9xqS96K jZVDz0twI2hb+Bd9Nfci75RbxthbCw8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713168767; a=rsa-sha256; cv=none; b=QdJjFH7foUoUlFha82lXKX/Dn4ZkQA+ll7SukEsrk/wd6nVL8AJuvajzlsss9+eXluc3X0 a6GTAhcG6mK7gy1GRRXPEDKbItW8QF9b9eLrj8hQUNtYh19cS16mjyFrX78Pv/nEAkXnbC 07NmGpplFeu9WEDH3AZH7XijujkpX4E= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4VJ0GP5fW9zNnk2; Mon, 15 Apr 2024 16:10:21 +0800 (CST) Received: from dggpemm100001.china.huawei.com (unknown [7.185.36.93]) by mail.maildlp.com (Postfix) with ESMTPS id 3F2F81400CD; Mon, 15 Apr 2024 16:12:43 +0800 (CST) Received: from localhost.localdomain (10.175.112.125) by dggpemm100001.china.huawei.com (7.185.36.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Mon, 15 Apr 2024 16:12:42 +0800 From: Kefeng Wang To: Andrew Morton CC: Huang Ying , Mel Gorman , Ryan Roberts , David Hildenbrand , Barry Song , Vlastimil Babka , Zi Yan , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Yang Shi , Yu Zhao , , Kefeng Wang Subject: [PATCH rfc 2/3] mm: add control to allow specified high-order pages stored on PCP list Date: Mon, 15 Apr 2024 16:12:19 +0800 Message-ID: <20240415081220.3246839-3-wangkefeng.wang@huawei.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20240415081220.3246839-1-wangkefeng.wang@huawei.com> References: <20240415081220.3246839-1-wangkefeng.wang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.112.125] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemm100001.china.huawei.com (7.185.36.93) X-Stat-Signature: to1mgtmmanqwwjhpyugznybqeny4ctyh X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 7F7CF120020 X-Rspam-User: X-HE-Tag: 1713168766-434950 X-HE-Meta: U2FsdGVkX19lc/JRSuF58NjutJGkbiXLFHzSgZTJMTdVOLTv1X1IQhTdQvRBuGNoReSd+8FiHBG9gvY5EV+95/zdgAFVZYW9LnsP3ZwVTiEQsp0i0aEzYl7YQBvKTGdoFTydhsQ2Uo+xhR77hKGyLA1M67U3q1M657kAikVNf46kZ4CxM4lmYH3uFKoYNp/oB9NtisgzG4ZzN5JOSjlqYmRdYEGmHUmMiED9yPkhNh3GR6Mi3oHM7EVwqk/SAdTA7Tv71prNj3QAw2EuPXwjSyxMXwGyszk2ej5DBg9iUdGe8lcMmQR+jk99AiGTLrQJovUG2WP09PSZ7ju5e7WtDLsPi0/bAyq/qljJbRzVYPO0f2McPgsEk0nB2SdWTdLxiiLy5rrcJ56xYCMrWaYlxdB907RYUxhZsyQDPQJzKEIXSF8Y7X3wC98hHYv5PwcgIW5Rkre5fYfDjTSshHWBwo5Xl6xS85ubt9IqQ7xifJvc6fXb0XoIu/WKm7c/G199xINf5tQ6AXxqnkNHsxQVfgSpBy/zAU2i7go/OdzeAD7xXnlKeWSevwLu/wP/tO9flAo5pjn4Kqgs+V9dzyp3mXDybKcoYqu7i/b6fCXCQglyNNp9RzJHEl98EMzyOyI6/AIhBBLfcpeWyUda1PdoyhTG/fqFOM2gxsmU5cHP3KjK11M1ef2PshJn+eoKeURulmm2Ed39upz4tktZkQ7PmgxmebGz3NpZOmp5sIJV6PFlSUs9WQkJJjidKgdGNp408XibuELx6UETG7AY0Ga2q613G9IaOSMVDN0MexHRNTkxXxGWMsLfQbbsl88VrEDdsFiXjsz2c+bRiThMYifYeQG6gmxK4FfP2Vu2Qo1EM6fx1wyNXtaI4sf0nQjlQuBs3yiJBg31v4Q7W8iDQflDGe5Lf0OY2Ixohw9BTElMKFFSyQkddfDMdbFrUQ5WmGzMAIPPwmDGIXrAfH0AqlM KB0hyefE gMsO1+7iH1D4DI5yhsb74ZwO8Wr+wHBVPRTVvI2bgT1yA8P63SlR5uGwWkv0aBTy/iD4d0fa72WJtFBvY5g0r1huIK+rSv4hkPnZMBXQSmIoSugIH5t3bfCMS15IjOZ1TduKfRSoZXAuWVWQ2VSsUsL4k+w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The high-order pages stored on PCP list may not always win, even herts some workloads, so it is disabled by default for high-orders except PMD_ORDER. Since there is already per-supported-THP-size interfaces to configrate mTHP behaviours, adding a new control pcp_enabled under above interfaces to allow user to enable/disable the specified high-order pages stored on PCP list or not, but it can't change the existing behaviour for order = PMD_ORDER and order <= PAGE_ALLOC_COSTLY_ORDER, they are always enabled and can't be disabled, meanwhile, when disabled by pcp_enabled for other high-orders, pcplists will be drained. Signed-off-by: Kefeng Wang --- Documentation/admin-guide/mm/transhuge.rst | 11 +++++ include/linux/gfp.h | 1 + include/linux/huge_mm.h | 1 + mm/huge_memory.c | 47 ++++++++++++++++++++++ mm/page_alloc.c | 16 ++++++++ 5 files changed, 76 insertions(+) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 04eb45a2f940..3cb91336f81a 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -189,6 +189,17 @@ madvise never should be self-explanatory. + +There's also sysfs knob to control hugepage to be stored on PCP lists for +high-orders(greated than PAGE_ALLOC_COSTLY_ORDER), which could reduce +the zone lock contention when allocate hige-order pages frequently. Please +note that the PCP behavior of low-order and PMD-order pages cannot changed, +it is possible to enable other higher-order pages stored on PCP lists by +writing 1 or disable it back by writing 0:: + + echo 0 >/sys/kernel/mm/transparent_hugepage/hugepages-kB/pcp_enabled + echo 1 >/sys/kernel/mm/transparent_hugepage/hugepages-kB/pcp_enabled + By default kernel tries to use huge, PMD-mappable zero page on read page fault to anonymous mapping. It's possible to disable huge zero page by writing 0 or enable it back by writing 1:: diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 450c2cbcf04b..2ae1157abd6e 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -365,6 +365,7 @@ extern void page_frag_free(void *addr); void page_alloc_init_cpuhp(void); int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp); +void drain_all_zone_pages(void); void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); void drain_all_pages(struct zone *zone); void drain_local_pages(struct zone *zone); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index b67294d5814f..86306becfd52 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -108,6 +108,7 @@ extern unsigned long transparent_hugepage_flags; extern unsigned long huge_anon_orders_always; extern unsigned long huge_anon_orders_madvise; extern unsigned long huge_anon_orders_inherit; +extern unsigned long huge_pcp_allow_orders; static inline bool hugepage_global_enabled(void) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9a1b57ef9c60..9b8a8aa36526 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -512,8 +512,49 @@ static ssize_t thpsize_enabled_store(struct kobject *kobj, static struct kobj_attribute thpsize_enabled_attr = __ATTR(enabled, 0644, thpsize_enabled_show, thpsize_enabled_store); +unsigned long huge_pcp_allow_orders __read_mostly; +static ssize_t thpsize_pcp_enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + int order = to_thpsize(kobj)->order; + + return sysfs_emit(buf, "%d\n", + !!test_bit(order, &huge_pcp_allow_orders)); +} + +static ssize_t thpsize_pcp_enabled_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + int order = to_thpsize(kobj)->order; + unsigned long value; + int ret; + + if (order <= PAGE_ALLOC_COSTLY_ORDER || order == PMD_ORDER) + return -EINVAL; + + ret = kstrtoul(buf, 10, &value); + if (ret < 0) + return ret; + if (value > 1) + return -EINVAL; + + if (value) { + set_bit(order, &huge_pcp_allow_orders); + } else { + if (test_and_clear_bit(order, &huge_pcp_allow_orders)) + drain_all_zone_pages(); + } + + return count; +} + +static struct kobj_attribute thpsize_pcp_enabled_attr = __ATTR(pcp_enabled, + 0644, thpsize_pcp_enabled_show, thpsize_pcp_enabled_store); + static struct attribute *thpsize_attrs[] = { &thpsize_enabled_attr.attr, + &thpsize_pcp_enabled_attr.attr, NULL, }; @@ -624,6 +665,8 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj) */ huge_anon_orders_inherit = BIT(PMD_ORDER); + huge_pcp_allow_orders = BIT(PMD_ORDER); + *hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj); if (unlikely(!*hugepage_kobj)) { pr_err("failed to create transparent hugepage kobject\n"); @@ -658,6 +701,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj) err = PTR_ERR(thpsize); goto remove_all; } + + if (order <= PAGE_ALLOC_COSTLY_ORDER) + huge_pcp_allow_orders |= BIT(order); + list_add(&thpsize->node, &thpsize_list); order = next_order(&orders, order); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2248afc7b73a..25fd3fe30cb0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -537,6 +537,8 @@ static inline bool pcp_allowed_order(unsigned int order) #ifdef CONFIG_TRANSPARENT_HUGEPAGE if (order == PCP_MAX_ORDER) return true; + if (BIT(order) & huge_pcp_allow_orders) + return true; #endif return false; } @@ -6705,6 +6707,20 @@ void zone_pcp_reset(struct zone *zone) } } +void drain_all_zone_pages(void) +{ + struct zone *zone; + + mutex_lock(&pcp_batch_high_lock); + for_each_populated_zone(zone) + __zone_set_pageset_high_and_batch(zone, 0, 0, 1); + __drain_all_pages(NULL, true); + for_each_populated_zone(zone) + __zone_set_pageset_high_and_batch(zone, zone->pageset_high_min, + zone->pageset_high_max, zone->pageset_batch); + mutex_unlock(&pcp_batch_high_lock); +} + #ifdef CONFIG_MEMORY_HOTREMOVE /* * All pages in the range must be in a single zone, must not contain holes, From patchwork Mon Apr 15 08:12:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kefeng Wang X-Patchwork-Id: 13629609 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32F04C00A94 for ; Mon, 15 Apr 2024 08:12:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EFB5A6B0092; Mon, 15 Apr 2024 04:12:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E5C326B0093; Mon, 15 Apr 2024 04:12:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D4C616B0095; Mon, 15 Apr 2024 04:12:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AF76C6B0092 for ; Mon, 15 Apr 2024 04:12:50 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 7806C1A04C0 for ; Mon, 15 Apr 2024 08:12:50 +0000 (UTC) X-FDA: 82011050100.14.7A99749 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by imf30.hostedemail.com (Postfix) with ESMTP id A274880012 for ; Mon, 15 Apr 2024 08:12:47 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf30.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713168768; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pb0vQZr6BCZfEtJKRV34ebWNy0uWYuBoFM9f7l27QP8=; b=wTfzGdFu22ikRxHL6BWQC1QZv4kI8KCKj2qJs8vLaus1cGv+abBAHKXlKy+/C2dKjFbiBB +2hiMLEUmc0bLrcrmzPsR/qEvz3F+MOM4ptxQ1AzmBTz+3ZAKAqE/D9R0jqASIkU06AdvU khzEh5b5FF7tIlQsOCKVnwVC8qucT4E= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf30.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713168768; a=rsa-sha256; cv=none; b=ae7+gSU9R1ECtDxdCatf+yUcykYQto+EyrC6+I4T8wvxOqikdqvmZMpmp09IrnPeZHxveL wvAE1e9b+rs0wCkyGygUrb17ib5KagnnbAU2MtKIvDEl/MF4RWyZwWGsZN7ZfiwOrh8wep pvRA6uz8o9xWQw30g1lrfNjD6ALhXD4= Received: from mail.maildlp.com (unknown [172.19.163.44]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4VJ0Fm0c6sz1hwST; Mon, 15 Apr 2024 16:09:48 +0800 (CST) Received: from dggpemm100001.china.huawei.com (unknown [7.185.36.93]) by mail.maildlp.com (Postfix) with ESMTPS id D1FF414011F; Mon, 15 Apr 2024 16:12:43 +0800 (CST) Received: from localhost.localdomain (10.175.112.125) by dggpemm100001.china.huawei.com (7.185.36.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Mon, 15 Apr 2024 16:12:43 +0800 From: Kefeng Wang To: Andrew Morton CC: Huang Ying , Mel Gorman , Ryan Roberts , David Hildenbrand , Barry Song , Vlastimil Babka , Zi Yan , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Yang Shi , Yu Zhao , , Kefeng Wang Subject: [PATCH rfc 3/3] mm: pcp: show per-order pages count Date: Mon, 15 Apr 2024 16:12:20 +0800 Message-ID: <20240415081220.3246839-4-wangkefeng.wang@huawei.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20240415081220.3246839-1-wangkefeng.wang@huawei.com> References: <20240415081220.3246839-1-wangkefeng.wang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.112.125] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemm100001.china.huawei.com (7.185.36.93) X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: A274880012 X-Stat-Signature: bttunhxpnpxcgmj1e1qj3zzsgr5ynriy X-HE-Tag: 1713168767-527320 X-HE-Meta: U2FsdGVkX1/ywEI2i3Cn6wG0AfBaIADMJkBmw9PL8u5JygiPqxW+YoLNBiU+Fymwt/f11QFCDNmK9Rl9axogeeOEQAwyMnDL7wArHJjpnivx4byvopr63kEXANC7mG1Mv5s1aW9hPR+pC/j/wpk+LBPCYNQ1CihNheNlYsFXnj3WBH3ThfgFV9tXP2sPoi9Xv5hux9H9W8d3UinqmOVDOcioD6QWVi1ahuN38xysBjbtkMeKf8WSqaIc8cdSScYmTk4lESRwX5qbhL7IUHM0B1ziL5dg6sPXkuNSdJDkCrXoX4/SuuAo11tlx+9oi/eFhtOR01EHryfLQdZc/Di3AOm7oc/7aPmLvf1DKt/R4Yoz9QLqKJpdgd+8UPyC85u+lOxPt1/QZ2YjskHa1JF9f8qoOP0yO22BGGlM5GDluD/kLBgo8d2wCVbPPDYJymN5dwA5T112Rr7m9K5EzF86Dfo9vpOE4zALykBIITVBjbVtLlJJ3xwAaV2RpF8A+7d1JcFyjSs1hQQ0wL0EdjtC2o2ujHelBSopCR8BP1DeYOS45m8ezvLFA3FgWdspxJW/SKGy+AAgyaSeqIAXjKnaKJNZOdJR7w1089QHSYFnxX+GxSmcE/V1Saej7oOF43Nzd5JIUm842qey6Y4nZUM8sfvIOTRj+vSwBSJwef8Jentu7n9b9TTzvtiNkqOzhIBSVqfkV8WR1jSXx393A6mtAw1HQBOfk2NRfE0IVw0XpTZGGAW1AujDwuqALCSfnRsG2Rdw8Vn5B5NjN4l6+iCWyR09vnPQAib7WExB9zMoQueUtrzSXyf+ti9+LNJadLHZ5lreUo+/vOGfGf9Ryy8PTkCcHja5tCQbV2RytAo7oq5QYtTy++znKbf8nBd2JoS1ugmTzadgbYh2+O54z37xy6cSBnnSyFdSZ9Cao3HueuxW/64BfRmFfkWoSHwICaWjT94g2tXQKvJ45YAm/Ox WiH4BwI5 i2UyGNTzZDL3wvqn+oaxI0M40pQ24myhJZ9PtQ8qyi2zsAeFj74odNEL1egLL4cCdYst2TNmJEc4qXOSlwZApdrbp4XDmJQlyd5fbVQOd8vQhXex0FJCgptEeN+kMyLsn5IGO/JJWDNRvUeEeFgkiSb3VcQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: THIS IS ONLY FOR DEBUG. Show more detail about per-order page count on each cpu in zoneinfo, and a new pcp_order_stat shows the total counts of each hugepage size in sysfs. #cat /proc/zoneinfo .... cpu: 15 count: 275 high: 529 batch: 63 order0: 59 order1: 28 order2: 28 order3: 6 order4: 0 order5: 0 order6: 0 order7: 0 order8: 0 order9: 0 #cat /sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/pcp_order_stat 10 Signed-off-by: Kefeng Wang --- include/linux/mmzone.h | 6 ++++++ include/linux/vmstat.h | 19 +++++++++++++++++++ mm/Kconfig.debug | 8 ++++++++ mm/huge_memory.c | 27 +++++++++++++++++++++++++++ mm/page_alloc.c | 4 ++++ mm/vmstat.c | 16 ++++++++++++++++ 6 files changed, 80 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c745e2f1a0f2..c32c01468a77 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -665,6 +665,9 @@ enum zone_watermarks { #define NR_LOWORDER_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER + 1)) #define HIGHORDER_PCP_LIST_INDEX (NR_LOWORDER_PCP_LISTS - (PAGE_ALLOC_COSTLY_ORDER + 1)) #define NR_PCP_LISTS (NR_LOWORDER_PCP_LISTS + NR_PCP_THP) +#ifdef CONFIG_PCP_ORDER_STATS +#define NR_PCP_ORDER (PAGE_ALLOC_COSTLY_ORDER + NR_PCP_THP + 1) +#endif #define min_wmark_pages(z) (z->_watermark[WMARK_MIN] + z->watermark_boost) #define low_wmark_pages(z) (z->_watermark[WMARK_LOW] + z->watermark_boost) @@ -702,6 +705,9 @@ struct per_cpu_pages { /* Lists of pages, one per migrate type stored on the pcp-lists */ struct list_head lists[NR_PCP_LISTS]; +#ifdef CONFIG_PCP_ORDER_STATS + int per_order_count[NR_PCP_ORDER]; /* per-order page counts */ +#endif } ____cacheline_aligned_in_smp; struct per_cpu_zonestat { diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 735eae6e272c..91843f2d327f 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -624,4 +624,23 @@ static inline void lruvec_stat_sub_folio(struct folio *folio, { lruvec_stat_mod_folio(folio, idx, -folio_nr_pages(folio)); } + +static inline void pcp_order_stat_mod(struct per_cpu_pages *pcp, int order, + int val) +{ +#ifdef CONFIG_PCP_ORDER_STATS + pcp->per_order_count[order] += val; +#endif +} + +static inline void pcp_order_stat_inc(struct per_cpu_pages *pcp, int order) +{ + pcp_order_stat_mod(pcp, order, 1); +} + +static inline void pcp_order_stat_dec(struct per_cpu_pages *pcp, int order) +{ + pcp_order_stat_mod(pcp, order, -1); +} + #endif /* _LINUX_VMSTAT_H */ diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index afc72fde0f03..57eef0ce809b 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -276,3 +276,11 @@ config PER_VMA_LOCK_STATS overhead in the page fault path. If in doubt, say N. + +config PCP_ORDER_STATS + bool "Statistics for per-order of PCP (Per-CPU pageset)" + help + Say Y to show per-order statistics of Per-CPU pageset from zoneinfo + and pcp_order_stat in sysfs. + + If in doubt, say N. diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9b8a8aa36526..0c6262bb8fe4 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -599,12 +599,39 @@ DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT); DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK); DEFINE_MTHP_STAT_ATTR(anon_swpin_refault, MTHP_STAT_ANON_SWPIN_REFAULT); +#ifdef CONFIG_PCP_ORDER_STATS +static ssize_t pcp_order_stat_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + int order = to_thpsize(kobj)->order; + unsigned int counts = 0; + struct zone *zone; + + for_each_populated_zone(zone) { + struct per_cpu_pages *pcp; + int i; + + for_each_online_cpu(i) { + pcp = per_cpu_ptr(zone->per_cpu_pageset, i); + counts += pcp->per_order_count[order]; + } + } + + return sysfs_emit(buf, "%u\n", counts); +} + +static struct kobj_attribute pcp_order_stat_attr = __ATTR_RO(pcp_order_stat); +#endif + static struct attribute *stats_attrs[] = { &anon_alloc_attr.attr, &anon_alloc_fallback_attr.attr, &anon_swpout_attr.attr, &anon_swpout_fallback_attr.attr, &anon_swpin_refault_attr.attr, +#ifdef CONFIG_PCP_ORDER_STATS + &pcp_order_stat_attr.attr, +#endif NULL, }; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 25fd3fe30cb0..f44cdf8dec50 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1185,6 +1185,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, list_del(&page->pcp_list); count -= nr_pages; pcp->count -= nr_pages; + pcp_order_stat_dec(pcp, order); __free_one_page(page, pfn, zone, order, mt, FPI_NONE); trace_mm_page_pcpu_drain(page, order, mt); @@ -2560,6 +2561,7 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, pindex = order_to_pindex(migratetype, order); list_add(&page->pcp_list, &pcp->lists[pindex]); pcp->count += 1 << order; + pcp_order_stat_inc(pcp, order); batch = READ_ONCE(pcp->batch); /* @@ -2957,6 +2959,7 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, migratetype, alloc_flags); pcp->count += alloced << order; + pcp_order_stat_mod(pcp, order, alloced); if (unlikely(list_empty(list))) return NULL; } @@ -2964,6 +2967,7 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, page = list_first_entry(list, struct page, pcp_list); list_del(&page->pcp_list); pcp->count -= 1 << order; + pcp_order_stat_dec(pcp, order); } while (check_new_pages(page, order)); return page; diff --git a/mm/vmstat.c b/mm/vmstat.c index db79935e4a54..632bb1ed6a53 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1674,6 +1674,19 @@ static bool is_zone_first_populated(pg_data_t *pgdat, struct zone *zone) return false; } +static void zoneinfo_show_pcp_order_stat(struct seq_file *m, + struct per_cpu_pages *pcp) +{ +#ifdef CONFIG_PCP_ORDER_STATS + int j; + + for (j = 0; j < NR_PCP_ORDER; j++) + seq_printf(m, + "\n order%d: %i", + j, pcp->per_order_count[j]); +#endif +} + static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, struct zone *zone) { @@ -1748,6 +1761,9 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, pcp->count, pcp->high, pcp->batch); + + zoneinfo_show_pcp_order_stat(m, pcp); + #ifdef CONFIG_SMP pzstats = per_cpu_ptr(zone->per_cpu_zonestats, i); seq_printf(m, "\n vm stats threshold: %d",