From patchwork Thu Oct 28 11:56:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ning Zhang X-Patchwork-Id: 12589931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2360C433F5 for ; Thu, 28 Oct 2021 11:57:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4AA4B60F21 for ; Thu, 28 Oct 2021 11:57:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4AA4B60F21 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 212F16B0071; Thu, 28 Oct 2021 07:57:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C085940008; Thu, 28 Oct 2021 07:57:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 061FE6B0073; Thu, 28 Oct 2021 07:57:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0134.hostedemail.com [216.40.44.134]) by kanga.kvack.org (Postfix) with ESMTP id D5F8F6B0071 for ; Thu, 28 Oct 2021 07:57:01 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 7936A31E5E for ; Thu, 28 Oct 2021 11:57:01 +0000 (UTC) X-FDA: 78745695042.09.F944EFD Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by imf13.hostedemail.com (Postfix) with ESMTP id 53A511049B56 for ; Thu, 28 Oct 2021 11:56:54 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R951e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=ningzhang@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0Uu.chCS_1635422216; Received: from localhost(mailfrom:ningzhang@linux.alibaba.com fp:SMTPD_---0Uu.chCS_1635422216) by smtp.aliyun-inc.com(127.0.0.1); Thu, 28 Oct 2021 19:56:57 +0800 From: Ning Zhang To: linux-mm@kvack.org Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Vladimir Davydov , Yu Zhao Subject: [RFC 3/6] mm, thp: introduce zero subpages reclaim threshold Date: Thu, 28 Oct 2021 19:56:52 +0800 Message-Id: <1635422215-99394-4-git-send-email-ningzhang@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1635422215-99394-1-git-send-email-ningzhang@linux.alibaba.com> References: <1635422215-99394-1-git-send-email-ningzhang@linux.alibaba.com> X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 53A511049B56 X-Stat-Signature: 5p4urcje94k39eidwngwa88mx8uuzrok Authentication-Results: imf13.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf13.hostedemail.com: domain of ningzhang@linux.alibaba.com designates 115.124.30.42 as permitted sender) smtp.mailfrom=ningzhang@linux.alibaba.com X-HE-Tag: 1635422214-879102 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In this patch, we add memory.thp_reclaim_ctrl for each memory cgroup to control thp reclaim. The first controller "threshold" is to set the reclaim threshold. The default value is 16, which means if a huge page contains over 16 zero subpages (estimated), the huge page can be split and the zero subpages can be reclaimed when the zero subpages reclaim is enable. You can change this value by: echo "threshold $v" > /sys/fs/cgroup/memory/{memcg}/thp_reclaim_ctrl Signed-off-by: Ning Zhang --- include/linux/huge_mm.h | 3 ++- include/linux/memcontrol.h | 3 +++ mm/huge_memory.c | 9 ++++--- mm/memcontrol.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++ mm/vmscan.c | 4 ++- 5 files changed, 75 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 04607b1..304e3df 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -187,7 +187,8 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, #ifdef CONFIG_MEMCG extern int global_thp_reclaim; -int zsr_get_hpage(struct hpage_reclaim *hr_queue, struct page **reclaim_page); +int zsr_get_hpage(struct hpage_reclaim *hr_queue, struct page **reclaim_page, + int threshold); unsigned long zsr_reclaim_hpage(struct lruvec *lruvec, struct page *page); static inline struct list_head *hpage_reclaim_list(struct page *page) { diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index f99f13f..4815c56 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -237,6 +237,8 @@ enum thp_reclaim_state { THP_RECLAIM_ENABLE, THP_RECLAIM_MEMCG, /* For global configure*/ }; + +#define THP_RECLAIM_THRESHOLD_DEFAULT 16 #endif /* * The memory controller data structure. The memory controller controls both @@ -356,6 +358,7 @@ struct mem_cgroup { #ifdef CONFIG_TRANSPARENT_HUGEPAGE struct deferred_split deferred_split_queue; int thp_reclaim; + int thp_reclaim_threshold; #endif struct mem_cgroup_per_node *nodeinfo[]; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 84fd738..40a9879 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3270,7 +3270,7 @@ static inline bool is_zero_page(struct page *page) * We'll split the huge page iff it contains at least 1/32 zeros, * estimate it by checking some discrete unsigned long values. */ -static bool hpage_estimate_zero(struct page *page) +static bool hpage_estimate_zero(struct page *page, int threshold) { unsigned int i, maybe_zero_pages = 0, offset = 0; void *addr; @@ -3281,7 +3281,7 @@ static bool hpage_estimate_zero(struct page *page) if (unlikely((offset + 1) * BYTES_PER_LONG > PAGE_SIZE)) offset = 0; if (*(const unsigned long *)(addr + offset) == 0UL) { - if (++maybe_zero_pages == HPAGE_PMD_NR >> 5) { + if (++maybe_zero_pages == threshold) { kunmap(page); return true; } @@ -3456,7 +3456,8 @@ static unsigned long reclaim_zero_subpages(struct list_head *list, * be stored in reclaim_page; otherwise, just delete the page from the * queue. */ -int zsr_get_hpage(struct hpage_reclaim *hr_queue, struct page **reclaim_page) +int zsr_get_hpage(struct hpage_reclaim *hr_queue, struct page **reclaim_page, + int threshold) { struct page *page = NULL; unsigned long flags; @@ -3482,7 +3483,7 @@ int zsr_get_hpage(struct hpage_reclaim *hr_queue, struct page **reclaim_page) spin_unlock_irqrestore(&hr_queue->reclaim_queue_lock, flags); - if (hpage_can_reclaim(page) && hpage_estimate_zero(page) && + if (hpage_can_reclaim(page) && hpage_estimate_zero(page, threshold) && !isolate_lru_page(page)) { __mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON, HPAGE_PMD_NR); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ae96781..7ba3c69 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4498,6 +4498,61 @@ static int mem_cgroup_thp_reclaim_write(struct cgroup_subsys_state *css, return 0; } + +static inline char *strsep_s(char **s, const char *ct) +{ + char *p; + + while ((p = strsep(s, ct))) { + if (*p) + return p; + } + + return NULL; +} + +static int memcg_thp_reclaim_ctrl_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); + int thp_reclaim_threshold = READ_ONCE(memcg->thp_reclaim_threshold); + + seq_printf(m, "threshold\t%d\n", thp_reclaim_threshold); + + return 0; +} + +static ssize_t memcg_thp_reclaim_ctrl_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + char *key, *value; + int ret; + + key = strsep_s(&buf, " \t\n"); + if (!key) + return -EINVAL; + + if (!strcmp(key, "threshold")) { + int threshold; + + value = strsep_s(&buf, " \t\n"); + if (!value) + return -EINVAL; + + ret = kstrtouint(value, 0, &threshold); + if (ret) + return ret; + + if (threshold > HPAGE_PMD_NR || threshold < 1) + return -EINVAL; + + xchg(&memcg->thp_reclaim_threshold, threshold); + } else + return -EINVAL; + + return nbytes; +} #endif #ifdef CONFIG_CGROUP_WRITEBACK @@ -5068,6 +5123,11 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of, .read_u64 = mem_cgroup_thp_reclaim_read, .write_u64 = mem_cgroup_thp_reclaim_write, }, + { + .name = "thp_reclaim_ctrl", + .seq_show = memcg_thp_reclaim_ctrl_show, + .write = memcg_thp_reclaim_ctrl_write, + }, #endif { }, /* terminate */ }; @@ -5265,6 +5325,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void) memcg->deferred_split_queue.split_queue_len = 0; memcg->thp_reclaim = THP_RECLAIM_DISABLE; + memcg->thp_reclaim_threshold = THP_RECLAIM_THRESHOLD_DEFAULT; #endif idr_replace(&mem_cgroup_idr, memcg, memcg->id.id); return memcg; @@ -5300,6 +5361,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void) page_counter_init(&memcg->tcpmem, &parent->tcpmem); #ifdef CONFIG_TRANSPARENT_HUGEPAGE memcg->thp_reclaim = parent->thp_reclaim; + memcg->thp_reclaim_threshold = parent->thp_reclaim_threshold; #endif } else { page_counter_init(&memcg->memory, NULL); diff --git a/mm/vmscan.c b/mm/vmscan.c index f4ff14d..fcc80a6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2794,6 +2794,7 @@ static unsigned long reclaim_hpage_zero_subpages(struct lruvec *lruvec, struct mem_cgroup *memcg; struct hpage_reclaim *hr_queue; int nid = lruvec->pgdat->node_id; + int threshold; unsigned long nr_reclaimed = 0, nr_scanned = 0, nr_to_scan; memcg = lruvec_memcg(lruvec); @@ -2806,11 +2807,12 @@ static unsigned long reclaim_hpage_zero_subpages(struct lruvec *lruvec, /* The last scan loop will scan all the huge pages.*/ nr_to_scan = priority == 0 ? 0 : MAX_SCAN_HPAGE; + threshold = READ_ONCE(memcg->thp_reclaim_threshold); do { struct page *page = NULL; - if (zsr_get_hpage(hr_queue, &page)) + if (zsr_get_hpage(hr_queue, &page, threshold)) break; if (!page)