From patchwork Thu Aug 18 09:04:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: mawupeng X-Patchwork-Id: 12947074 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA20FC00140 for ; Thu, 18 Aug 2022 13:23:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 00A818D0002; Thu, 18 Aug 2022 09:23:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EFAC06B0074; Thu, 18 Aug 2022 09:23:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DEA088D0002; Thu, 18 Aug 2022 09:23:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D055D6B0073 for ; Thu, 18 Aug 2022 09:23:04 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 97D4A819E8 for ; Thu, 18 Aug 2022 13:23:04 +0000 (UTC) X-FDA: 79812779088.01.B580882 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by imf27.hostedemail.com (Postfix) with ESMTP id DBA4E4032F for ; Thu, 18 Aug 2022 12:55:51 +0000 (UTC) Received: from dggpemm500021.china.huawei.com (unknown [172.30.72.53]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4M7f5D5Qk6z1N7L1; Thu, 18 Aug 2022 17:01:36 +0800 (CST) Received: from dggpemm500014.china.huawei.com (7.185.36.153) by dggpemm500021.china.huawei.com (7.185.36.109) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Thu, 18 Aug 2022 17:04:58 +0800 Received: from localhost.localdomain (10.175.112.125) by dggpemm500014.china.huawei.com (7.185.36.153) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Thu, 18 Aug 2022 17:04:57 +0800 From: Wupeng Ma To: CC: , , , , , , , , , , , , , , , , Subject: [PATCH -next 1/2] mm: Cap zone movable's min wmark to small value Date: Thu, 18 Aug 2022 17:04:29 +0800 Message-ID: <20220818090430.2859992-2-mawupeng1@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220818090430.2859992-1-mawupeng1@huawei.com> References: <20220818090430.2859992-1-mawupeng1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.112.125] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemm500014.china.huawei.com (7.185.36.153) X-CFilter-Loop: Reflected ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660827353; a=rsa-sha256; cv=none; b=rfWu6iQurZPhBbXuMFu1Do5hqzRd/RwHO+fu2PW350PwWdJS/SJJU6OjOEhwCRqZvT3Qxz rkBpGrG0WW5BZ8J3UejV8kLOQbuYinTpVMBDmJviINsO6W1IjcHVs2LYGZrMmfh1kQ4qao V+JDmPLc3sDiy/wbaK1TsLcWO8A0LSw= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf27.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660827353; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=thPp/jicusGp+VBrBvH+ras4oDvlB7vGMGcVVTmXnR4=; b=JyiUsTjPf9lMKLE1srGdp9Z0/0GIlIQvu2s9HCzRdDlReP1CNnW6U6v9KXvarHJTOp0s6T a40MNCD58gZBjm4bZSfUYrL0tPm2vQP7EHmH1RyJNHzoayy5xOaWgiyh8sFc8S6UN+HSPN JqcmrG5kXMVniAXhjXUEWpssSw7UdxU= X-Rspamd-Queue-Id: DBA4E4032F Authentication-Results: imf27.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf27.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: fpt71s9ezwf5f19eziohg5qtm5o1nbeq X-HE-Tag: 1660827351-7434 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ma Wupeng Since min_free_kbytes is based on gfp_zone(GFP_USER) which does not include zone movable. However zone movable will get its min share in __setup_per_zone_wmarks() which does not make any sense. And like highmem pages, __GFP_HIGH and PF_MEMALLOC allocations usually don't need movable pages, so there is no need to assign min pages for zone movable. Let's cap pages_min for zone movable to a small value here just link highmem pages. Signed-off-by: Ma Wupeng --- mm/page_alloc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d3f62316c137..4f62e3d74bf2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8638,7 +8638,7 @@ static void __setup_per_zone_wmarks(void) /* Calculate total number of !ZONE_HIGHMEM pages */ for_each_zone(zone) { - if (!is_highmem(zone)) + if (!is_highmem(zone) && zone_idx(zone) != ZONE_MOVABLE) lowmem_pages += zone_managed_pages(zone); } @@ -8648,7 +8648,7 @@ static void __setup_per_zone_wmarks(void) spin_lock_irqsave(&zone->lock, flags); tmp = (u64)pages_min * zone_managed_pages(zone); do_div(tmp, lowmem_pages); - if (is_highmem(zone)) { + if (is_highmem(zone) || zone_idx(zone) == ZONE_MOVABLE) { /* * __GFP_HIGH and PF_MEMALLOC allocations usually don't * need highmem pages, so cap pages_min to a small From patchwork Thu Aug 18 09:04:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: mawupeng X-Patchwork-Id: 12946925 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D322AC00140 for ; Thu, 18 Aug 2022 10:57:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1CE1C6B0075; Thu, 18 Aug 2022 06:57:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 156FF8D0002; Thu, 18 Aug 2022 06:57:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 021B68D0001; Thu, 18 Aug 2022 06:57:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E2B696B0075 for ; Thu, 18 Aug 2022 06:57:54 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B60CA160A26 for ; Thu, 18 Aug 2022 10:57:54 +0000 (UTC) X-FDA: 79812413268.19.2F57E5D Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf14.hostedemail.com (Postfix) with ESMTP id BF993100389 for ; Thu, 18 Aug 2022 10:54:05 +0000 (UTC) Received: from dggpemm500020.china.huawei.com (unknown [172.30.72.54]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4M7f7K71KfzGpYJ; Thu, 18 Aug 2022 17:03:25 +0800 (CST) Received: from dggpemm500014.china.huawei.com (7.185.36.153) by dggpemm500020.china.huawei.com (7.185.36.49) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Thu, 18 Aug 2022 17:04:59 +0800 Received: from localhost.localdomain (10.175.112.125) by dggpemm500014.china.huawei.com (7.185.36.153) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Thu, 18 Aug 2022 17:04:58 +0800 From: Wupeng Ma To: CC: , , , , , , , , , , , , , , , , Subject: [PATCH -next 2/2] mm: sysctl: Introduce per zone watermark_scale_factor Date: Thu, 18 Aug 2022 17:04:30 +0800 Message-ID: <20220818090430.2859992-3-mawupeng1@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220818090430.2859992-1-mawupeng1@huawei.com> References: <20220818090430.2859992-1-mawupeng1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.112.125] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemm500014.china.huawei.com (7.185.36.153) X-CFilter-Loop: Reflected ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660820046; a=rsa-sha256; cv=none; b=PT3P6s7yFJjttV9tMIMJnwediq/E+Ncme96DkxK97igD0aDAR5BbC4tx7PavGjUDn6lrBl Eyn5742we//cnW+J3rwQATizztwiCosGM/KP2vdflxB+lWMmwQ1zKAun+OtsZS2e6md4Ty +c3/7VwhBUDxfbPIqDba+mFmXJ3yGzI= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660820046; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uuJJWx9/J+m8Psmej/S9Q6hz8otPzp01Jghito/bLKo=; b=Xy22ptE+60sN795iFfbOj1k4D4yKnIwPCBHzcWybNv4lGoN5HcGnBl5I8+brPUmgCj7VxY Rhy6LIhMJ+v2/4Fu2890jSjWCM4aVjruI5jWTWKbi3UUYGK0fMI4HP0dhhpAN9YDfdcAEF dJVEyA91iEJRHP5Zl9vxt2wOI5ngq38= X-Stat-Signature: 899zujr8qit4t6k7qhgn59aqurxxorz8 X-Rspamd-Queue-Id: BF993100389 Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1660820045-55804 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ma Wupeng System may have little normal zone memory and huge movable memory in the following situations: - for system with kernelcore=nn% or kernelcore=mirror, movable zone will be added and movable zone is bigger than normal zone in most cases. - system with movable nodes, they will have multiple numa nodes with only movable zone and movable zone will have plenty of memory. Since kernel/driver can only use memory from non-movable zone in most cases, normal zone need to increase its watermark to reserve more memory. However, current watermark_scale_factor is used to control all zones at once and can't be set separately. To reserve memory in non-movable zones, the watermark is increased in movable zones as well. Which will lead to inefficient kswapd. To solve this problem, per zone watermark is introduced to tune each zone's watermark separately. This can bring the following advantages: - each zone can set its own watermark which bring flexibility - lead to more efficient kswapd if this watermark is set fine Here is real watermark data in my qemu machine(with THP disabled). With watermark_scale_factor = 10, there is only 1440(772-68+807-71) pages(5.76M) reserved for a system with 96G of memory. However if the watermark is set to 100, the movable zone's watermark increased to 231908(93M), which is too much. This situation is even worse with 32G of normal zone memory and 1T of movable zone memory. Modified | Vanilla wm_factor = 10 | Vanilla wm_factor = 30 Node 0, zone DMA | Node 0, zone DMA | Node 0, zone DMA min 68 | min 68 | min 68 low 7113 | low 772 | low 7113 high **14158** | high **1476** | high **14158** Node 0, zone Normal | Node 0, zone Normal | Node 0, zone Normal min 71 | min 71 | min 71 low 7438 | low 807 | low 7438 high 14805 | high 1543 | high 14805 Node 0, zone Movable | Node 0, zone Movable | Node 0, zone Movable min 1455 | min 1455 | min 1455 low 16388 | low 16386 | low 150787 high **31321** | high **31317** | high **300119** Node 1, zone Movable | Node 1, zone Movable | Node 1, zone Movable min 804 | min 804 | min 804 low 9061 | low 9061 | low 83379 high **17318** | high **17318** | high **165954** With the modified per zone watermark_scale_factor, only dma/normal zone will increase its watermark via the following command which the huge movable zone stay the same. % echo 100 100 100 10 > /proc/sys/vm/watermark_scale_factor The reason to disable THP is khugepaged_min_free_kbytes_update() will update min watermark. Signed-off-by: Ma Wupeng Reported-by: kernel test robot Reported-by: kernel test robot --- Documentation/admin-guide/sysctl/vm.rst | 6 ++++ include/linux/mm.h | 2 +- include/linux/mmzone.h | 4 +-- kernel/sysctl.c | 2 -- mm/page_alloc.c | 37 ++++++++++++++++++++----- 5 files changed, 39 insertions(+), 12 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 9b833e439f09..ec240aa45322 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -1002,6 +1002,12 @@ that the number of free pages kswapd maintains for latency reasons is too small for the allocation bursts occurring in the system. This knob can then be used to tune kswapd aggressiveness accordingly. +The watermark_scale_factor is an array. You can set each zone's watermark +separately and can be seen by reading this file:: + + % cat /proc/sys/vm/watermark_scale_factor + 10 10 10 10 + zone_reclaim_mode ================= diff --git a/include/linux/mm.h b/include/linux/mm.h index 3bedc449c14d..7f1eba1541f8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2525,7 +2525,7 @@ extern void setup_per_cpu_pageset(void); /* page_alloc.c */ extern int min_free_kbytes; extern int watermark_boost_factor; -extern int watermark_scale_factor; +extern int watermark_scale_factor[MAX_NR_ZONES]; extern bool arch_has_descending_max_zone_pfns(void); /* nommu.c */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index aa712aa35744..8e6258186d3c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1173,8 +1173,8 @@ struct ctl_table; int min_free_kbytes_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *); -int watermark_scale_factor_sysctl_handler(struct ctl_table *, int, void *, - size_t *, loff_t *); +int watermark_scale_factor_sysctl_handler(struct ctl_table *table, int write, + void *buffer, size_t *length, loff_t *ppos); extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES]; int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 205d605cacc5..d16d06c71e5a 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2251,8 +2251,6 @@ static struct ctl_table vm_table[] = { .maxlen = sizeof(watermark_scale_factor), .mode = 0644, .proc_handler = watermark_scale_factor_sysctl_handler, - .extra1 = SYSCTL_ONE, - .extra2 = SYSCTL_THREE_THOUSAND, }, { .procname = "percpu_pagelist_high_fraction", diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4f62e3d74bf2..b81dcda9f702 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -421,7 +421,6 @@ compound_page_dtor * const compound_page_dtors[NR_COMPOUND_DTORS] = { int min_free_kbytes = 1024; int user_min_free_kbytes = -1; int watermark_boost_factor __read_mostly = 15000; -int watermark_scale_factor = 10; static unsigned long nr_kernel_pages __initdata; static unsigned long nr_all_pages __initdata; @@ -449,6 +448,20 @@ EXPORT_SYMBOL(nr_online_nodes); int page_group_by_mobility_disabled __read_mostly; +int watermark_scale_factor[MAX_NR_ZONES] = { +#ifdef CONFIG_ZONE_DMA + [ZONE_DMA] = 10, +#endif +#ifdef CONFIG_ZONE_DMA32 + [ZONE_DMA32] = 10, +#endif + [ZONE_NORMAL] = 10, +#ifdef CONFIG_HIGHMEM + [ZONE_HIGHMEM] = 10, +#endif + [ZONE_MOVABLE] = 10, +}; + #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* * During boot we initialize deferred pages on-demand, as needed, but once @@ -8643,6 +8656,7 @@ static void __setup_per_zone_wmarks(void) } for_each_zone(zone) { + int zone_wm_factor; u64 tmp; spin_lock_irqsave(&zone->lock, flags); @@ -8676,9 +8690,10 @@ static void __setup_per_zone_wmarks(void) * scale factor in proportion to available memory, but * ensure a minimum size on small systems. */ + zone_wm_factor = watermark_scale_factor[zone_idx(zone)]; tmp = max_t(u64, tmp >> 2, - mult_frac(zone_managed_pages(zone), - watermark_scale_factor, 10000)); + mult_frac(zone_managed_pages(zone), zone_wm_factor, + 10000)); zone->watermark_boost = 0; zone->_watermark[WMARK_LOW] = min_wmark_pages(zone) + tmp; @@ -8795,14 +8810,22 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *table, int write, return 0; } +/* + * watermark_scale_factor_sysctl_handler - just a wrapper around + * proc_dointvec() so that we can call setup_per_zone_wmarks() + * whenever watermark_scale_factor changes. + */ int watermark_scale_factor_sysctl_handler(struct ctl_table *table, int write, void *buffer, size_t *length, loff_t *ppos) { - int rc; + int i; - rc = proc_dointvec_minmax(table, write, buffer, length, ppos); - if (rc) - return rc; + proc_dointvec_minmax(table, write, buffer, length, ppos); + + for (i = 0; i < MAX_NR_ZONES; i++) + watermark_scale_factor[i] = + clamp(watermark_scale_factor[i], 1, + *(int *)SYSCTL_THREE_THOUSAND); if (write) setup_per_zone_wmarks();