From patchwork Mon Apr 15 08:12:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kefeng Wang X-Patchwork-Id: 13629607 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A052C00A94 for ; Mon, 15 Apr 2024 08:12:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB4646B0089; Mon, 15 Apr 2024 04:12:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A41966B0092; Mon, 15 Apr 2024 04:12:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 841696B0089; Mon, 15 Apr 2024 04:12:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 650DE6B0092 for ; Mon, 15 Apr 2024 04:12:49 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 19640140498 for ; Mon, 15 Apr 2024 08:12:49 +0000 (UTC) X-FDA: 82011050058.22.D5FE18D Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf29.hostedemail.com (Postfix) with ESMTP id 7F7CF120020 for ; Mon, 15 Apr 2024 08:12:46 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713168767; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WHw3DPwl+hQH9EDbLagjIt6TfjwKQQU3f2ijGBgPZQ8=; b=iRXRum0SrIX2Frw0CeBshc2sg3pmwhTftkyTOjI/Ar1Tv1YVSzgUBOn1AScKRl70EKe3Zy ylJj6lRUgTKvRxZwF3nAedW5XaDCnOkwpE4yAIMpgUQz89vJhj/eE2NBZRA1Sbq9xqS96K jZVDz0twI2hb+Bd9Nfci75RbxthbCw8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713168767; a=rsa-sha256; cv=none; b=QdJjFH7foUoUlFha82lXKX/Dn4ZkQA+ll7SukEsrk/wd6nVL8AJuvajzlsss9+eXluc3X0 a6GTAhcG6mK7gy1GRRXPEDKbItW8QF9b9eLrj8hQUNtYh19cS16mjyFrX78Pv/nEAkXnbC 07NmGpplFeu9WEDH3AZH7XijujkpX4E= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4VJ0GP5fW9zNnk2; Mon, 15 Apr 2024 16:10:21 +0800 (CST) Received: from dggpemm100001.china.huawei.com (unknown [7.185.36.93]) by mail.maildlp.com (Postfix) with ESMTPS id 3F2F81400CD; Mon, 15 Apr 2024 16:12:43 +0800 (CST) Received: from localhost.localdomain (10.175.112.125) by dggpemm100001.china.huawei.com (7.185.36.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Mon, 15 Apr 2024 16:12:42 +0800 From: Kefeng Wang To: Andrew Morton CC: Huang Ying , Mel Gorman , Ryan Roberts , David Hildenbrand , Barry Song , Vlastimil Babka , Zi Yan , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Yang Shi , Yu Zhao , , Kefeng Wang Subject: [PATCH rfc 2/3] mm: add control to allow specified high-order pages stored on PCP list Date: Mon, 15 Apr 2024 16:12:19 +0800 Message-ID: <20240415081220.3246839-3-wangkefeng.wang@huawei.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20240415081220.3246839-1-wangkefeng.wang@huawei.com> References: <20240415081220.3246839-1-wangkefeng.wang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.112.125] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemm100001.china.huawei.com (7.185.36.93) X-Stat-Signature: to1mgtmmanqwwjhpyugznybqeny4ctyh X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 7F7CF120020 X-Rspam-User: X-HE-Tag: 1713168766-434950 X-HE-Meta: U2FsdGVkX19lc/JRSuF58NjutJGkbiXLFHzSgZTJMTdVOLTv1X1IQhTdQvRBuGNoReSd+8FiHBG9gvY5EV+95/zdgAFVZYW9LnsP3ZwVTiEQsp0i0aEzYl7YQBvKTGdoFTydhsQ2Uo+xhR77hKGyLA1M67U3q1M657kAikVNf46kZ4CxM4lmYH3uFKoYNp/oB9NtisgzG4ZzN5JOSjlqYmRdYEGmHUmMiED9yPkhNh3GR6Mi3oHM7EVwqk/SAdTA7Tv71prNj3QAw2EuPXwjSyxMXwGyszk2ej5DBg9iUdGe8lcMmQR+jk99AiGTLrQJovUG2WP09PSZ7ju5e7WtDLsPi0/bAyq/qljJbRzVYPO0f2McPgsEk0nB2SdWTdLxiiLy5rrcJ56xYCMrWaYlxdB907RYUxhZsyQDPQJzKEIXSF8Y7X3wC98hHYv5PwcgIW5Rkre5fYfDjTSshHWBwo5Xl6xS85ubt9IqQ7xifJvc6fXb0XoIu/WKm7c/G199xINf5tQ6AXxqnkNHsxQVfgSpBy/zAU2i7go/OdzeAD7xXnlKeWSevwLu/wP/tO9flAo5pjn4Kqgs+V9dzyp3mXDybKcoYqu7i/b6fCXCQglyNNp9RzJHEl98EMzyOyI6/AIhBBLfcpeWyUda1PdoyhTG/fqFOM2gxsmU5cHP3KjK11M1ef2PshJn+eoKeURulmm2Ed39upz4tktZkQ7PmgxmebGz3NpZOmp5sIJV6PFlSUs9WQkJJjidKgdGNp408XibuELx6UETG7AY0Ga2q613G9IaOSMVDN0MexHRNTkxXxGWMsLfQbbsl88VrEDdsFiXjsz2c+bRiThMYifYeQG6gmxK4FfP2Vu2Qo1EM6fx1wyNXtaI4sf0nQjlQuBs3yiJBg31v4Q7W8iDQflDGe5Lf0OY2Ixohw9BTElMKFFSyQkddfDMdbFrUQ5WmGzMAIPPwmDGIXrAfH0AqlM KB0hyefE gMsO1+7iH1D4DI5yhsb74ZwO8Wr+wHBVPRTVvI2bgT1yA8P63SlR5uGwWkv0aBTy/iD4d0fa72WJtFBvY5g0r1huIK+rSv4hkPnZMBXQSmIoSugIH5t3bfCMS15IjOZ1TduKfRSoZXAuWVWQ2VSsUsL4k+w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The high-order pages stored on PCP list may not always win, even herts some workloads, so it is disabled by default for high-orders except PMD_ORDER. Since there is already per-supported-THP-size interfaces to configrate mTHP behaviours, adding a new control pcp_enabled under above interfaces to allow user to enable/disable the specified high-order pages stored on PCP list or not, but it can't change the existing behaviour for order = PMD_ORDER and order <= PAGE_ALLOC_COSTLY_ORDER, they are always enabled and can't be disabled, meanwhile, when disabled by pcp_enabled for other high-orders, pcplists will be drained. Signed-off-by: Kefeng Wang --- Documentation/admin-guide/mm/transhuge.rst | 11 +++++ include/linux/gfp.h | 1 + include/linux/huge_mm.h | 1 + mm/huge_memory.c | 47 ++++++++++++++++++++++ mm/page_alloc.c | 16 ++++++++ 5 files changed, 76 insertions(+) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 04eb45a2f940..3cb91336f81a 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -189,6 +189,17 @@ madvise never should be self-explanatory. + +There's also sysfs knob to control hugepage to be stored on PCP lists for +high-orders(greated than PAGE_ALLOC_COSTLY_ORDER), which could reduce +the zone lock contention when allocate hige-order pages frequently. Please +note that the PCP behavior of low-order and PMD-order pages cannot changed, +it is possible to enable other higher-order pages stored on PCP lists by +writing 1 or disable it back by writing 0:: + + echo 0 >/sys/kernel/mm/transparent_hugepage/hugepages-kB/pcp_enabled + echo 1 >/sys/kernel/mm/transparent_hugepage/hugepages-kB/pcp_enabled + By default kernel tries to use huge, PMD-mappable zero page on read page fault to anonymous mapping. It's possible to disable huge zero page by writing 0 or enable it back by writing 1:: diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 450c2cbcf04b..2ae1157abd6e 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -365,6 +365,7 @@ extern void page_frag_free(void *addr); void page_alloc_init_cpuhp(void); int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp); +void drain_all_zone_pages(void); void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); void drain_all_pages(struct zone *zone); void drain_local_pages(struct zone *zone); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index b67294d5814f..86306becfd52 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -108,6 +108,7 @@ extern unsigned long transparent_hugepage_flags; extern unsigned long huge_anon_orders_always; extern unsigned long huge_anon_orders_madvise; extern unsigned long huge_anon_orders_inherit; +extern unsigned long huge_pcp_allow_orders; static inline bool hugepage_global_enabled(void) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9a1b57ef9c60..9b8a8aa36526 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -512,8 +512,49 @@ static ssize_t thpsize_enabled_store(struct kobject *kobj, static struct kobj_attribute thpsize_enabled_attr = __ATTR(enabled, 0644, thpsize_enabled_show, thpsize_enabled_store); +unsigned long huge_pcp_allow_orders __read_mostly; +static ssize_t thpsize_pcp_enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + int order = to_thpsize(kobj)->order; + + return sysfs_emit(buf, "%d\n", + !!test_bit(order, &huge_pcp_allow_orders)); +} + +static ssize_t thpsize_pcp_enabled_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + int order = to_thpsize(kobj)->order; + unsigned long value; + int ret; + + if (order <= PAGE_ALLOC_COSTLY_ORDER || order == PMD_ORDER) + return -EINVAL; + + ret = kstrtoul(buf, 10, &value); + if (ret < 0) + return ret; + if (value > 1) + return -EINVAL; + + if (value) { + set_bit(order, &huge_pcp_allow_orders); + } else { + if (test_and_clear_bit(order, &huge_pcp_allow_orders)) + drain_all_zone_pages(); + } + + return count; +} + +static struct kobj_attribute thpsize_pcp_enabled_attr = __ATTR(pcp_enabled, + 0644, thpsize_pcp_enabled_show, thpsize_pcp_enabled_store); + static struct attribute *thpsize_attrs[] = { &thpsize_enabled_attr.attr, + &thpsize_pcp_enabled_attr.attr, NULL, }; @@ -624,6 +665,8 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj) */ huge_anon_orders_inherit = BIT(PMD_ORDER); + huge_pcp_allow_orders = BIT(PMD_ORDER); + *hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj); if (unlikely(!*hugepage_kobj)) { pr_err("failed to create transparent hugepage kobject\n"); @@ -658,6 +701,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj) err = PTR_ERR(thpsize); goto remove_all; } + + if (order <= PAGE_ALLOC_COSTLY_ORDER) + huge_pcp_allow_orders |= BIT(order); + list_add(&thpsize->node, &thpsize_list); order = next_order(&orders, order); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2248afc7b73a..25fd3fe30cb0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -537,6 +537,8 @@ static inline bool pcp_allowed_order(unsigned int order) #ifdef CONFIG_TRANSPARENT_HUGEPAGE if (order == PCP_MAX_ORDER) return true; + if (BIT(order) & huge_pcp_allow_orders) + return true; #endif return false; } @@ -6705,6 +6707,20 @@ void zone_pcp_reset(struct zone *zone) } } +void drain_all_zone_pages(void) +{ + struct zone *zone; + + mutex_lock(&pcp_batch_high_lock); + for_each_populated_zone(zone) + __zone_set_pageset_high_and_batch(zone, 0, 0, 1); + __drain_all_pages(NULL, true); + for_each_populated_zone(zone) + __zone_set_pageset_high_and_batch(zone, zone->pageset_high_min, + zone->pageset_high_max, zone->pageset_batch); + mutex_unlock(&pcp_batch_high_lock); +} + #ifdef CONFIG_MEMORY_HOTREMOVE /* * All pages in the range must be in a single zone, must not contain holes,