From patchwork Wed Oct 30 08:33:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gutierrez Asier X-Patchwork-Id: 13856115 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AEECD7497F for ; Wed, 30 Oct 2024 08:33:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 15A008D0001; Wed, 30 Oct 2024 04:33:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 046768D0005; Wed, 30 Oct 2024 04:33:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D173C8D0001; Wed, 30 Oct 2024 04:33:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AB2CA8D0006 for ; Wed, 30 Oct 2024 04:33:46 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 59BCC1A0626 for ; Wed, 30 Oct 2024 08:33:46 +0000 (UTC) X-FDA: 82729604958.02.15445F3 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf04.hostedemail.com (Postfix) with ESMTP id D2ECC4001B for ; Wed, 30 Oct 2024 08:33:11 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=none; spf=pass (imf04.hostedemail.com: domain of gutierrez.asier@huawei-partners.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=gutierrez.asier@huawei-partners.com; dmarc=pass (policy=quarantine) header.from=huawei-partners.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730277144; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t0o+aNC7z3dbkzXbeMfyU8X1dkY839Wkm3UIkvnpevg=; b=T8ZvTYS8TXykL+QyCAu/DO06x2S/+L2P4rZo2oe7xlZlR9TAMoV9PLlfEv0vj+VIO1Nb4j X/tdDzn7JemlbzlU6RVnmmRzzxUstXBF2KNpfKUnclbx11d44jC4fbQ4Oxxc+buL0KPaGH TOJTCx1gKSHgP5RICrGDzCYPGT6hreM= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=none; spf=pass (imf04.hostedemail.com: domain of gutierrez.asier@huawei-partners.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=gutierrez.asier@huawei-partners.com; dmarc=pass (policy=quarantine) header.from=huawei-partners.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730277144; a=rsa-sha256; cv=none; b=w68YXOWcWxDmj4AyAc1a8qsnTQcWZbHXZFkC+J9hMrn2hujAtnaB+5nEzy0u9Zw1zos7Bs iVnybxjgbNNSUrkfYMuTG9YmuvE7Qt0tFwA+CjfFbhsWdUx0x9v0zpA6MzIrSE22F4jJzf emtF51xEcgXUj6U6RkD52eCc0s2aZFA= Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4XdgJL0NLLz6GFtv; Wed, 30 Oct 2024 16:28:50 +0800 (CST) Received: from mscpeml500003.china.huawei.com (unknown [7.188.49.51]) by mail.maildlp.com (Postfix) with ESMTPS id D731A140AA7; Wed, 30 Oct 2024 16:33:40 +0800 (CST) Received: from mscphis01197.huawei.com (10.123.65.218) by mscpeml500003.china.huawei.com (7.188.49.51) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.34; Wed, 30 Oct 2024 11:33:40 +0300 From: To: , , , , , , , , , , CC: , , , , , , , , , , , Subject: [RFC PATCH 1/3] mm: Add thp_flags control for cgroup Date: Wed, 30 Oct 2024 16:33:09 +0800 Message-ID: <20241030083311.965933-2-gutierrez.asier@huawei-partners.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241030083311.965933-1-gutierrez.asier@huawei-partners.com> References: <20241030083311.965933-1-gutierrez.asier@huawei-partners.com> MIME-Version: 1.0 X-Originating-IP: [10.123.65.218] X-ClientProxiedBy: mscpeml500003.china.huawei.com (7.188.49.51) To mscpeml500003.china.huawei.com (7.188.49.51) X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: D2ECC4001B X-Stat-Signature: ycrjzbmkcw5x9ohjfps3ye5at35jqsq4 X-HE-Tag: 1730277191-420411 X-HE-Meta: U2FsdGVkX1+qw61j6FVd0RDl6i54KlGbq0JM04aEnUatkTtnG8mAndH2YcjNPWQybZEgLJdmXeFrzktt5HfOnMTiV2hX3uHEvBux/QZ+V6AyarumM2n46h+DoBt06EIJdukGTLDjH/YGJtMeo3PgDFdxmWUNJtSpp3AYbNFroPkp4AKl8FHYOR3jUIDU2wl2l+Rr/xl0VGM57g3MwX2BSvZrHi2ggrg1Q2g5LKPV2+W2O5mxiApOsJzWh12QlLs4CDc09HTD8UrtzqLD5QRkPSBx2zlqub92TF7xEFcVypQlAVQWjA2rSv1kruyj+KgQf5kn16n2w35EKFajhXFvVcgJbom6g5Y0GX5WjCTJOBOXqfHCtBDhKBIXQxEpAYw8vJbBpiaytVcQ35RboqQIm0vUcbZDYzwOi8DHBTUL/i65/YKTeKBy8Q6xt4ST0g/lhTHsR3vlImTb8cpyXr+MhCcMEYcGgGssjefu5+S5iwVed+m0nPeMftjCVf1XDkALFe3LufYdjCVQ3JhCDOjRRebZU+/L2C9ontqD9uMoO7qAFMzP6s9BNevXy4bIoWACjLQw6xWw6IlXqnaTTXtEeSjmxxixQMBT4lH30tMMy3+P8JzS3ZtT7vlbQ7FbZjwJP517V7eQkmBQFJ3IK1iPa80Xz5Ys11ljw3sQLlI/J+7N9nu6DcGcaRVXWW4kmmYwV0RTu6cT1TII2VOlyoZ2fCZdytrxD389yM/gjZZ/BTi3B/lbE0fxwBX2CVpfZmr5xQF1d4yfGaLknZH+6HsuOmX2ynhuWC9OdRcnF61SLpIc+rLTqnDPGgiVqZRI81oD+bLqpkXsf4yQtn6WPJZ2pZyOt5HGQtSxjjUuurFu4qrxsevJSK+ybvjADPGFxuFzrwDvWRgd/ehbzBctYNi0GxpegWOB542PGPFduSKHhz291N2wFtw1e1SssczN31Wy1L08VCBCtoPBbZ2lWDW TJb+NIy5 xLx2rDj+IU0ZwYJHCtiFzvfbta7OKKOVssrEY8mZTrR+prX0nc+SMR7uiym2fRyfsWWj8a0PUiyH+YL4o1J6on4yEn0AAIp+aosMryoNl9Sykt4gNrJXWVlaPvnsiS+r+OLgfsu1TsaafWyie1RGIswBAVLIIigiaTCiXNMWs5t9q8YCgJVLBOF6XLeaZApEpt4m+2TyRXqUxCoQGjVsCEkraHh4aYVlKvAbpx+G8uQV1dn8reAk/OmynQK1cJHs97x40 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Asier Gutierrez Exposed a new file in memory cgroup called memory.thp_enabled. This file works in the same way and same format as thp settings in /sys/kernel/mm/transparent_hugepage/enabled. The patch allows to read from and write to that file, changing effectively the memory cgroup THP policy. New cgroups will inherit the THP policies from their parents. Signed-off-by: Asier Gutierrez Signed-off-by: Anatoly Stepanov Reviewed-by: Alexander Kozhevnikov --- include/linux/huge_mm.h | 5 +++ include/linux/memcontrol.h | 15 +++++++ mm/huge_memory.c | 71 ++++++++++++++++++++----------- mm/memcontrol.c | 86 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 153 insertions(+), 24 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e25d9ebfdf89..86c0fb4c0b28 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -53,6 +53,9 @@ enum transparent_hugepage_flag { TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG, }; +#define HUGEPAGE_FLAGS_ENABLED_MASK ((1UL << TRANSPARENT_HUGEPAGE_FLAG) |\ + (1UL << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)) + struct kobject; struct kobj_attribute; @@ -430,6 +433,8 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct folio *folio); +int thp_enabled_parse(const char *buf, unsigned long *flags); +const char *thp_enabled_string(unsigned long flags); #else /* CONFIG_TRANSPARENT_HUGEPAGE */ static inline bool folio_test_pmd_mappable(struct folio *folio) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0e5bf25d324f..87b5fe93e19d 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -315,6 +315,12 @@ struct mem_cgroup { spinlock_t event_list_lock; #endif /* CONFIG_MEMCG_V1 */ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + unsigned long thp_flags; + unsigned long thp_anon_orders_always; + unsigned long thp_anon_orders_madvise; + unsigned long thp_anon_orders_inherit; +#endif struct mem_cgroup_per_node *nodeinfo[]; }; @@ -1615,6 +1621,15 @@ struct sock; bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages, gfp_t gfp_mask); void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages); +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +int memory_thp_enabled_show(struct seq_file *m, void *v); +ssize_t memory_thp_enabled_write(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off); + +int mem_cgroup_thp_flags_update_all(unsigned long flags, unsigned long mask); +unsigned long memcg_get_thp_flags_all(unsigned long mask); +unsigned long memcg_get_thp_flags(struct vm_area_struct *vma); +#endif #ifdef CONFIG_MEMCG extern struct static_key_false memcg_sockets_enabled_key; #define mem_cgroup_sockets_enabled static_branch_unlikely(&memcg_sockets_enabled_key) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 67c86a5d64a6..0fbdd8213443 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -46,6 +46,8 @@ #include "internal.h" #include "swap.h" +#include + #define CREATE_TRACE_POINTS #include @@ -287,21 +289,43 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink, static struct shrinker *huge_zero_page_shrinker; -#ifdef CONFIG_SYSFS -static ssize_t enabled_show(struct kobject *kobj, - struct kobj_attribute *attr, char *buf) +const char *thp_enabled_string(unsigned long flags) { const char *output; - if (test_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags)) + if (test_bit(TRANSPARENT_HUGEPAGE_FLAG, &flags)) output = "[always] madvise never"; - else if (test_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, - &transparent_hugepage_flags)) + else if (test_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &flags)) output = "always [madvise] never"; else output = "always madvise [never]"; - return sysfs_emit(buf, "%s\n", output); + return output; +} + +int thp_enabled_parse(const char *buf, unsigned long *flags) +{ + if (sysfs_streq(buf, "always")) { + clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, flags); + set_bit(TRANSPARENT_HUGEPAGE_FLAG, flags); + } else if (sysfs_streq(buf, "madvise")) { + clear_bit(TRANSPARENT_HUGEPAGE_FLAG, flags); + set_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, flags); + } else if (sysfs_streq(buf, "never")) { + clear_bit(TRANSPARENT_HUGEPAGE_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, flags); + } else + return -EINVAL; + + return 0; +} + +#ifdef CONFIG_SYSFS +static ssize_t enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + unsigned long flags = transparent_hugepage_flags; + return sysfs_emit(buf, "%s\n", thp_enabled_string(flags)); } static ssize_t enabled_store(struct kobject *kobj, @@ -309,24 +333,21 @@ static ssize_t enabled_store(struct kobject *kobj, const char *buf, size_t count) { ssize_t ret = count; + int err; - if (sysfs_streq(buf, "always")) { - clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); - set_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags); - } else if (sysfs_streq(buf, "madvise")) { - clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags); - set_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); - } else if (sysfs_streq(buf, "never")) { - clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); - } else - ret = -EINVAL; + ret = thp_enabled_parse(buf, &transparent_hugepage_flags) ? : count; + if (ret <= 0) + goto out; - if (ret > 0) { - int err = start_stop_khugepaged(); - if (err) - ret = err; - } + if (IS_ENABLED(CONFIG_MEMCG) && !mem_cgroup_disabled()) + err = mem_cgroup_thp_flags_update_all(transparent_hugepage_flags, + HUGEPAGE_FLAGS_ENABLED_MASK); + else + err = start_stop_khugepaged(); + + if (err) + ret = err; +out: return ret; } @@ -1036,7 +1057,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma) { const bool vma_madvised = vma && (vma->vm_flags & VM_HUGEPAGE); - +#ifdef CONFIG_MEMCG + unsigned long transparent_hugepage_flags = memcg_get_thp_flags(vma); +#endif /* Always do synchronous compaction */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags)) return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d563fb515766..2b25c45c85c3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -970,6 +970,33 @@ struct mem_cgroup *get_mem_cgroup_from_current(void) return memcg; } +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static inline bool memcg_thp_always_enabled(struct mem_cgroup *memcg) +{ + return test_bit(TRANSPARENT_HUGEPAGE_FLAG, &memcg->thp_flags); +} + +static inline bool memcg_thp_madvise_enabled(struct mem_cgroup *memcg) +{ + return test_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &memcg->thp_flags); +} + +unsigned long memcg_get_thp_flags(struct vm_area_struct *vma) +{ + unsigned long flags = 0UL; + struct mem_cgroup *memcg = get_mem_cgroup_from_mm(vma->vm_mm); + + if (!memcg) + goto out; + + flags = READ_ONCE(memcg->thp_flags); +out: + if (memcg) + css_put(&memcg->css); + return flags; +} +#endif + /** * mem_cgroup_iter - iterate over memory cgroup hierarchy * @root: hierarchy root @@ -3625,6 +3652,11 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) WRITE_ONCE(memcg->oom_kill_disable, READ_ONCE(parent->oom_kill_disable)); page_counter_init(&memcg->kmem, &parent->kmem); page_counter_init(&memcg->tcpmem, &parent->tcpmem); +#endif +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + WRITE_ONCE(memcg->thp_flags, READ_ONCE(parent->thp_flags)); + WRITE_ONCE(memcg->thp_anon_orders_inherit, + READ_ONCE(parent->thp_anon_orders_inherit)); #endif } else { init_memcg_stats(); @@ -3634,6 +3666,17 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) #ifdef CONFIG_MEMCG_V1 page_counter_init(&memcg->kmem, NULL); page_counter_init(&memcg->tcpmem, NULL); +#endif +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + WRITE_ONCE(memcg->thp_flags, +#ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS + (1<thp_anon_orders_inherit, BIT(PMD_ORDER)); #endif root_mem_cgroup = memcg; return &memcg->css; @@ -4315,6 +4358,19 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, return nbytes; } +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +DEFINE_MUTEX(memcg_thp_flags_mutex); + +int memory_thp_enabled_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + unsigned long flags = READ_ONCE(memcg->thp_flags); + + seq_printf(m, "%s\n", thp_enabled_string(flags)); + return 0; +} +#endif + static struct cftype memory_files[] = { { .name = "current", @@ -4383,6 +4439,12 @@ static struct cftype memory_files[] = { .flags = CFTYPE_NS_DELEGATABLE, .write = memory_reclaim, }, +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + { + .name = "thp_enabled", + .seq_show = memory_thp_enabled_show, + }, +#endif { } /* terminate */ }; @@ -4844,6 +4906,30 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages) refill_stock(memcg, nr_pages); } +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +int mem_cgroup_thp_flags_update_all(unsigned long new_flags, unsigned long mask) +{ + int ret = 0; + struct mem_cgroup *iter, *memcg = root_mem_cgroup; + unsigned long enabled_mask = + (1UL << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG) | + (1UL << TRANSPARENT_HUGEPAGE_FLAG); + + mutex_lock(&memcg_thp_flags_mutex); + enabled_mask &= new_flags; + + for_each_mem_cgroup_tree(iter, memcg) { + unsigned long old_flags = iter->thp_flags; + + iter->thp_flags = (old_flags & ~mask) | new_flags; + } + + mutex_unlock(&memcg_thp_flags_mutex); + return ret; +} + +#endif + static int __init cgroup_memory(char *s) { char *token; From patchwork Wed Oct 30 08:33:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gutierrez Asier X-Patchwork-Id: 13856114 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9AA3CD7497D for ; Wed, 30 Oct 2024 08:33:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E039F8D0008; Wed, 30 Oct 2024 04:33:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D8F638D0006; Wed, 30 Oct 2024 04:33:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C30A18D0007; Wed, 30 Oct 2024 04:33:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A001D8D0001 for ; Wed, 30 Oct 2024 04:33:46 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4570C808EE for ; Wed, 30 Oct 2024 08:33:46 +0000 (UTC) X-FDA: 82729603866.09.05B0459 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf02.hostedemail.com (Postfix) with ESMTP id BF1A080006 for ; Wed, 30 Oct 2024 08:32:55 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of gutierrez.asier@huawei-partners.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=gutierrez.asier@huawei-partners.com; dmarc=pass (policy=quarantine) header.from=huawei-partners.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730277169; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y8XEiiXvD1uBrsyq69uX1CqdoSZNuhHMa3fFqMR0OkY=; b=q0n8q5/FXPL0Rj7nvxeylqPNCg8R/P1HPwOMCx86CuAIkjxMZJTHgPiFc2KXiML+fIo806 fG9yvZZQenLcC2dOASXJHSr69o90PsHzjZsszxu6bdYPfmeChPrUs1fMrxFsHDFVNaqzJ2 b5MbNkKVtyAy8MaHORl+XkVm0aF2Fqs= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of gutierrez.asier@huawei-partners.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=gutierrez.asier@huawei-partners.com; dmarc=pass (policy=quarantine) header.from=huawei-partners.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730277169; a=rsa-sha256; cv=none; b=YOsuoQQNGVXo+nXgxt+L/Q9esEZZDyqm8kKauqf/U0yI7OkqaNGBuH1DoS197UDVnT8Du9 WgUIS207PbMZhzsSPSaMr3IdxsgmwHAvDeH+nPr6MBzWyfV2DjBre9A11ZhFFcFXsDhO56 GwUHog8XgeK4lfJQr8Dmp2ph4Mec9y8= Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4XdgNR2dfdz6D95c; Wed, 30 Oct 2024 16:32:23 +0800 (CST) Received: from mscpeml500003.china.huawei.com (unknown [7.188.49.51]) by mail.maildlp.com (Postfix) with ESMTPS id ED54F140C72; Wed, 30 Oct 2024 16:33:40 +0800 (CST) Received: from mscphis01197.huawei.com (10.123.65.218) by mscpeml500003.china.huawei.com (7.188.49.51) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.34; Wed, 30 Oct 2024 11:33:40 +0300 From: To: , , , , , , , , , , CC: , , , , , , , , , , , Subject: [RFC PATCH 2/3] mm: Support for huge pages in cgroups Date: Wed, 30 Oct 2024 16:33:10 +0800 Message-ID: <20241030083311.965933-3-gutierrez.asier@huawei-partners.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241030083311.965933-1-gutierrez.asier@huawei-partners.com> References: <20241030083311.965933-1-gutierrez.asier@huawei-partners.com> MIME-Version: 1.0 X-Originating-IP: [10.123.65.218] X-ClientProxiedBy: mscpeml500003.china.huawei.com (7.188.49.51) To mscpeml500003.china.huawei.com (7.188.49.51) X-Rspam-User: X-Stat-Signature: pwxzgd8pe37zwhaco51wrr133rm1t6cb X-Rspamd-Queue-Id: BF1A080006 X-Rspamd-Server: rspam11 X-HE-Tag: 1730277175-423474 X-HE-Meta: U2FsdGVkX1+97BsNqZNOv33p2XUsdv2PcWLTHxVJDwEwJZoQpgxxK6JZP+NnLqCITdm02GBLagmfSEYWdyGLtA+PTfM5ASPutsDIz66oxj33/opuEKsH2X7elXPW/vzyoCZo0+Fw05jO6t/URs1K3NqJ563uLdyNpCtrRo/7qoIx51KBe4FtOJ8ah2esVVaSJxdq2M+9CYbv3hjX0AQGVMiApMlvRhnDQ6qhPHFopZSd45DnFJ40xLCEha+GS1FD4IYeoOBpDbkdu93HRSS0MFMoMzTmcgKh5qA8rZpu3qTX6M4g6y2lnZB38R4zwfni/LsdNVcgbIOZGqIUSiXPRVXh/tcGxGCIjbc81tHWP60/iumpovtpzhWHATVZqyi2u399mSOzByVzXnVsxTDCSB0TFOcDMgvPm5+3USbO6HRiuj76Qh6bOAfT/HjLdeSZElORVNulhgadrFrJm4fIUAUmSqdhVHo3HMyOHHK2d9lwZftGwjZIFeVX/r8VNnBpE5Qt9ufyRFMENvIr1eQzJTubBxIj+Zi/sEqHYwtf0EOks04TCtvixk9aVH0r2QwO0LM5O1cL3P82j5SmNS8YMz6vtprRWhwiwPgQVfUEgjGl39mkDD9HDq+/mrPHpDm+W+jzHAhUL9U84nLQf8OkfcR1KICws+c3BTriWqo/hprAchJk4QiXYoEa0r/vgaXl7sQ3ySKXuLmfINcJKfuqVD5+VSQFl7ul+iKC89QEBzLXI7Ur1enhGHrmv6gVjFqA91IrJuPzsDKyv4aPBk3OBc6dHj3yWdbET874RF9yTvrPMY7ndC0csbCQYpsOJjNlX6p9MOwOj9XNhHo8+lliFL4Prn1IfO/OebZpBf5vtJuQv9pCXmcuCJGeOt12oQp688APpdC8kPhucziqYyUogOyegjEg9+hH/bfceq4lexeRbgwOIy8Z2y6MwNejdXNZD85pg5hWilsedNiGUdv +qc2hmYe bQo1CrSugCFYceAhvPTvt3EXxuOBfq9yAnQvkOmObxaln2/QdxiqOFp+Z/J4jzE3ETLD6WAt7pQpKT5WhkqweUbvD0oGypqFR6Z0cmIJOvfz67OUaRGPipGmOqxKhhtG/h0Kiw+Yt0ee4w/4rkYh1Tvjo8NNByj28XDsstCP3WQQKiYEKtwoE/89BW6h2216NCr5IqaZ6bL8fa9CI2kayDFiJaQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Asier Gutierrez This patch adds support for the correct order mask depending on the memory THP policy. Also, khugepaged lazy collpasing and kernel boot parameter THP override were added. Signed-off-by: Asier Gutierrez Signed-off-by: Anatoly Stepanov Reviewed-by: Alexander Kozhevnikov --- include/linux/huge_mm.h | 10 ++- include/linux/khugepaged.h | 2 +- include/linux/memcontrol.h | 11 +++ mm/huge_memory.c | 22 ++++-- mm/khugepaged.c | 8 +- mm/memcontrol.c | 147 ++++++++++++++++++++++++++++++++++++- 6 files changed, 187 insertions(+), 13 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 86c0fb4c0b28..f99ac9b7e5bc 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -207,6 +207,12 @@ static inline unsigned long thp_vma_suitable_orders(struct vm_area_struct *vma, return orders; } +#if defined(CONFIG_MEMCG) && defined(CONFIG_TRANSPARENT_HUGEPAGE) +bool memcg_thp_vma_allowable_orders(struct vm_area_struct *vma, + unsigned long vm_flags, + unsigned long *res_mask); +#endif + static inline bool file_thp_enabled(struct vm_area_struct *vma) { struct inode *inode; @@ -255,7 +261,9 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, if (hugepage_global_always() || ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled())) mask |= READ_ONCE(huge_anon_orders_inherit); - +#if defined(CONFIG_MEMCG) && defined(CONFIG_TRANSPARENT_HUGEPAGE) + memcg_thp_vma_allowable_orders(vma, vm_flags, &mask); +#endif orders &= mask; if (!orders) return 0; diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index f68865e19b0b..50cabca48b93 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -9,7 +9,7 @@ extern struct attribute_group khugepaged_attr_group; extern int khugepaged_init(void); extern void khugepaged_destroy(void); -extern int start_stop_khugepaged(void); +extern int start_stop_khugepaged(bool force_stop); extern void __khugepaged_enter(struct mm_struct *mm); extern void __khugepaged_exit(struct mm_struct *mm); extern void khugepaged_enter_vma(struct vm_area_struct *vma, diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 87b5fe93e19d..d78318782af8 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -23,6 +23,7 @@ #include #include #include +#include struct mem_cgroup; struct obj_cgroup; @@ -1069,6 +1070,9 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm, void split_page_memcg(struct page *head, int old_order, int new_order); +bool memcg_thp_vma_allowable_orders(struct vm_area_struct *vma, + unsigned long vm_flags, + unsigned long *res_mask); #else /* CONFIG_MEMCG */ #define MEM_CGROUP_ID_SHIFT 0 @@ -1476,6 +1480,13 @@ void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx) static inline void split_page_memcg(struct page *head, int old_order, int new_order) { } + +static inline bool memcg_thp_vma_allowable_orders(struct vm_area_struct *vma, + unsigned long vm_flags, + unsigned long *res_mask) +{ + return false; +} #endif /* CONFIG_MEMCG */ /* diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0fbdd8213443..fdffdfc8605c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -172,15 +172,23 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, } if (!vma_is_anonymous(vma)) { + bool memcg_enabled = false; /* * Enforce sysfs THP requirements as necessary. Anonymous vmas * were already handled in thp_vma_allowable_orders(). */ - if (enforce_sysfs && - (!hugepage_global_enabled() || (!(vm_flags & VM_HUGEPAGE) && - !hugepage_global_always()))) - return 0; + if (enforce_sysfs) { + unsigned long mask = 0UL; + + memcg_enabled = memcg_thp_vma_allowable_orders(vma, vm_flags, &mask); + if (memcg_enabled && !mask) + return 0; + if (!memcg_enabled && (!hugepage_global_enabled() || + (!(vm_flags & VM_HUGEPAGE) && + !hugepage_global_always()))) + return 0; + } /* * Trust that ->huge_fault() handlers know what they are doing * in fault path. @@ -343,7 +351,7 @@ static ssize_t enabled_store(struct kobject *kobj, err = mem_cgroup_thp_flags_update_all(transparent_hugepage_flags, HUGEPAGE_FLAGS_ENABLED_MASK); else - err = start_stop_khugepaged(); + err = start_stop_khugepaged(false); if (err) ret = err; @@ -539,7 +547,7 @@ static ssize_t thpsize_enabled_store(struct kobject *kobj, if (ret > 0) { int err; - err = start_stop_khugepaged(); + err = start_stop_khugepaged(false); if (err) ret = err; } @@ -812,7 +820,7 @@ static int __init hugepage_init(void) return 0; } - err = start_stop_khugepaged(); + err = start_stop_khugepaged(false); if (err) goto err_khugepaged; diff --git a/mm/khugepaged.c b/mm/khugepaged.c index cdd1d8655a76..ebed9bf8cfb5 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -415,6 +415,8 @@ static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm) static bool hugepage_pmd_enabled(void) { + if (IS_ENABLED(CONFIG_MEMCG) && !mem_cgroup_disabled()) + return true; /* * We cover both the anon and the file-backed case here; file-backed * hugepages, when configured in, are determined by the global control. @@ -2586,7 +2588,7 @@ static void set_recommended_min_free_kbytes(void) int nr_zones = 0; unsigned long recommended_min; - if (!hugepage_pmd_enabled()) { + if (!hugepage_pmd_enabled() || !khugepaged_thread) { calculate_min_free_kbytes(); goto update_wmarks; } @@ -2631,12 +2633,12 @@ static void set_recommended_min_free_kbytes(void) setup_per_zone_wmarks(); } -int start_stop_khugepaged(void) +int start_stop_khugepaged(bool force_stop) { int err = 0; mutex_lock(&khugepaged_mutex); - if (hugepage_pmd_enabled()) { + if (hugepage_pmd_enabled() && !force_stop) { if (!khugepaged_thread) khugepaged_thread = kthread_run(khugepaged, NULL, "khugepaged"); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2b25c45c85c3..938e6894c0b3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -981,6 +981,37 @@ static inline bool memcg_thp_madvise_enabled(struct mem_cgroup *memcg) return test_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &memcg->thp_flags); } +bool memcg_thp_vma_allowable_orders(struct vm_area_struct *vma, + unsigned long vm_flags, + unsigned long *res_mask) +{ + unsigned long mask = 0UL; + + struct mem_cgroup *memcg = get_mem_cgroup_from_mm(vma->vm_mm); + + if (!memcg) + return false; + + if (memcg_thp_always_enabled(memcg) || + ((vm_flags & VM_HUGEPAGE) && + memcg_thp_madvise_enabled(memcg))) { + if (!vma_is_anonymous(vma)) + mask = THP_ORDERS_ALL_FILE_DEFAULT; + else { + mask = READ_ONCE(memcg->thp_anon_orders_always); + + if (vm_flags & VM_HUGEPAGE) + mask |= READ_ONCE(memcg->thp_anon_orders_madvise); + + mask = mask | READ_ONCE(memcg->thp_anon_orders_inherit); + } + } + + css_put(&memcg->css); + *res_mask = mask; + return true; +} + unsigned long memcg_get_thp_flags(struct vm_area_struct *vma) { unsigned long flags = 0UL; @@ -3986,10 +4017,52 @@ static void mem_cgroup_kmem_attach(struct cgroup_taskset *tset) } } +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static int mem_cgroup_notify_khugepaged_cb(struct task_struct *p, void *arg) +{ + struct vm_area_struct *vma = NULL; + struct mem_cgroup *memcg = arg; + bool is_madvise = memcg_thp_madvise_enabled(memcg); + bool is_always = memcg_thp_always_enabled(memcg); + + VMA_ITERATOR(vmi, p->mm, 0); + + if (!is_always && !is_madvise) { + khugepaged_exit(p->mm); + return 0; + } + + for_each_vma(vmi, vma) { + if (is_madvise && !(vma->vm_flags & VM_HUGEPAGE)) + continue; + + khugepaged_enter_vma(vma, vma->vm_flags); + + if (test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags)) + break; + } + + return 0; +} + +static void mem_cgroup_thp_attach(struct cgroup_taskset *tset) +{ + struct task_struct *task; + struct cgroup_subsys_state *css; + + cgroup_taskset_for_each(task, css, tset) { + mem_cgroup_notify_khugepaged_cb(task, mem_cgroup_from_css(css)); + } +} +#else +static void mem_cgroup_thp_attach(struct cgroup_taskset *tset) {} +#endif + static void mem_cgroup_attach(struct cgroup_taskset *tset) { mem_cgroup_lru_gen_attach(tset); mem_cgroup_kmem_attach(tset); + mem_cgroup_thp_attach(tset); } static int seq_puts_memcg_tunable(struct seq_file *m, unsigned long value) @@ -4369,6 +4442,54 @@ int memory_thp_enabled_show(struct seq_file *m, void *v) seq_printf(m, "%s\n", thp_enabled_string(flags)); return 0; } + +static int mem_cgroup_notify_khugepaged(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter; + int ret = 0; + + for_each_mem_cgroup_tree(iter, memcg) { + struct css_task_iter it; + struct task_struct *task; + + css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it); + while (!ret && (task = css_task_iter_next(&it))) { + if (!task->mm || (task->flags & PF_KTHREAD)) + continue; + + ret = mem_cgroup_notify_khugepaged_cb(task, memcg); + } + css_task_iter_end(&it); + if (ret) { + mem_cgroup_iter_break(memcg, iter); + break; + } + } + + return ret; +} + +ssize_t memory_thp_enabled_write(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + int err; + int ret = nbytes; + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + buf = strstrip(buf); + + mutex_lock(&memcg_thp_flags_mutex); + ret = thp_enabled_parse(buf, &memcg->thp_flags) ? : nbytes; + if (ret > 0) { + err = mem_cgroup_notify_khugepaged(memcg); + if (!err) + err = start_stop_khugepaged(false); + if (err) + ret = err; + } + mutex_unlock(&memcg_thp_flags_mutex); + return ret; +} #endif static struct cftype memory_files[] = { @@ -4443,6 +4564,7 @@ static struct cftype memory_files[] = { { .name = "thp_enabled", .seq_show = memory_thp_enabled_show, + .write = memory_thp_enabled_write, }, #endif { } /* terminate */ @@ -4909,7 +5031,9 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages) #ifdef CONFIG_TRANSPARENT_HUGEPAGE int mem_cgroup_thp_flags_update_all(unsigned long new_flags, unsigned long mask) { - int ret = 0; + int ret; + struct css_task_iter task_iter; + struct task_struct *task; struct mem_cgroup *iter, *memcg = root_mem_cgroup; unsigned long enabled_mask = (1UL << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG) | @@ -4922,8 +5046,18 @@ int mem_cgroup_thp_flags_update_all(unsigned long new_flags, unsigned long mask) unsigned long old_flags = iter->thp_flags; iter->thp_flags = (old_flags & ~mask) | new_flags; + + css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &task_iter); + while ((task = css_task_iter_next(&task_iter))) { + if (!task->mm || (task->flags & PF_KTHREAD)) + continue; + + mem_cgroup_notify_khugepaged_cb(task, iter); + } + css_task_iter_end(&task_iter); } + ret = start_stop_khugepaged(!enabled_mask); mutex_unlock(&memcg_thp_flags_mutex); return ret; } @@ -5509,4 +5643,15 @@ static int __init mem_cgroup_swap_init(void) } subsys_initcall(mem_cgroup_swap_init); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static int __init mem_cgroup_thp_init(void) +{ + if (IS_ENABLED(CONFIG_MEMCG)) + root_mem_cgroup->thp_flags = transparent_hugepage_flags; + + return 0; +} +subsys_initcall(mem_cgroup_thp_init); +#endif #endif /* CONFIG_SWAP */ From patchwork Wed Oct 30 08:33:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gutierrez Asier X-Patchwork-Id: 13856117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9E87D7497F for ; Wed, 30 Oct 2024 08:33:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9BF208D0006; Wed, 30 Oct 2024 04:33:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 946888D0007; Wed, 30 Oct 2024 04:33:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C0528D0006; Wed, 30 Oct 2024 04:33:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5437B8D0007 for ; Wed, 30 Oct 2024 04:33:47 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 0C44C1A0626 for ; Wed, 30 Oct 2024 08:33:47 +0000 (UTC) X-FDA: 82729603530.20.8EF01B3 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf03.hostedemail.com (Postfix) with ESMTP id 4E32B20004 for ; Wed, 30 Oct 2024 08:33:33 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=none; spf=pass (imf03.hostedemail.com: domain of gutierrez.asier@huawei-partners.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=gutierrez.asier@huawei-partners.com; dmarc=pass (policy=quarantine) header.from=huawei-partners.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730277170; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oan/HeFzdTboOvssPG2t6eHxUD+6OwZgk6v31DShWaY=; b=njgKbPhJ5lAPhnAkixdacaXEcVODblaFFYTzuG7ZO10uZhq/OtFRpYeov5pXQ7VzvlWpko bxTFAo0dSHFchm3BOAv3jd0LPuhZxksT2fzxQ2Fbb43/UNPlqYB8YawkYwF/Xm+uGaibbP 8JXbSDFua9v1ZxVbVSjLAVWb3m6q1TY= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=none; spf=pass (imf03.hostedemail.com: domain of gutierrez.asier@huawei-partners.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=gutierrez.asier@huawei-partners.com; dmarc=pass (policy=quarantine) header.from=huawei-partners.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730277170; a=rsa-sha256; cv=none; b=L8HJUdAew5E0EqGTmNvaKnstv0+hZjwMFiFeQEW6EvbxLKdFzvppKVBtRvHMOBPkgchHSm 3p/jEQE18+rEJ66g1qm4EEkhtobMV6qCzYgNdLuCn74OWDlclRs6TaYyOCxcAOvXByceGx KTui7ll33//qHKiHDr8UF2OBCQc0g5w= Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4XdgJL2H7Pz6GFtq; Wed, 30 Oct 2024 16:28:50 +0800 (CST) Received: from mscpeml500003.china.huawei.com (unknown [7.188.49.51]) by mail.maildlp.com (Postfix) with ESMTPS id 0FB3A140CB9; Wed, 30 Oct 2024 16:33:41 +0800 (CST) Received: from mscphis01197.huawei.com (10.123.65.218) by mscpeml500003.china.huawei.com (7.188.49.51) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.34; Wed, 30 Oct 2024 11:33:40 +0300 From: To: , , , , , , , , , , CC: , , , , , , , , , , , Subject: [RFC PATCH 3/3] mm: Add thp_defrag control for cgroup Date: Wed, 30 Oct 2024 16:33:11 +0800 Message-ID: <20241030083311.965933-4-gutierrez.asier@huawei-partners.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241030083311.965933-1-gutierrez.asier@huawei-partners.com> References: <20241030083311.965933-1-gutierrez.asier@huawei-partners.com> MIME-Version: 1.0 X-Originating-IP: [10.123.65.218] X-ClientProxiedBy: mscpeml500003.china.huawei.com (7.188.49.51) To mscpeml500003.china.huawei.com (7.188.49.51) X-Rspam-User: X-Stat-Signature: 7rnaqn6ugz71jwr33d99ssd39mptax1k X-Rspamd-Queue-Id: 4E32B20004 X-Rspamd-Server: rspam11 X-HE-Tag: 1730277213-149033 X-HE-Meta: U2FsdGVkX1+roVORsBuJGFUHNaGVq8L8X+B8wUXil5jG892gwT+pLjPx6te6oOITRPCUNQPAUQWTq+SMAFDaVeAvhlI6U5VMh3zloxK4hqXe1wsxH5zDEK8tfyYNxtGgNeOpg3n3NvpN9Zq3tv212QOnBXSUhBovBGcbK35OcciyDn59SaIcNF/+N3aAmcxLWZoW5za1N2yZ23c13xT89SkDnB2WmYi+Kv50gzp8+z/nDw9iRnE9f1NMiy3gXFe06MoX9qpY5DtfiGY6bQtoaiO03ANvvNpUGAp05SyaaXihwxaqjN3h7JgAcNAhxQUy9c5aPTDFRr95bskMFWPpu/egjbWBEDbez7UFCHZQGcRx9SfE6C0YRRcNY0GHvj1OqSx64+YxNLckd62AlgsSxSSsSMuuvLX/1pfzGG9nsO54OEtiLPTJe8UosO23Qk3d5TUC3VWYWhJ+XT6oafxOUJtWD4s+0f6g2TNQhY7FLIU2lBg1nMg1TWKxIvM1r0wm5SxO+n2vRxLvXzqf8smw3YFooyr8aTy3u/RGGK/lU7/W5/QpVho0qrx3vuFgQpCtzhUPFjL66BHGwTlMFQsiaHQt+g+0srv8O/rpx53Wc3axWxyeuHa3THwMR8wtbnWl8ROoZAZ3L35iXhnlevANTDNeWQJsaaUKNXeH9BmvN74tyJDE9p2h+sv7XyV8RRdFU+s6g7T3LOgqXHhiaP5Hc2eeXjep62htCI5Z7l43uw7bytufzSEHAEx/QCE2dkJbuYZ5nHZSIHCR8kXRsBDqy04CrvVK7EC05D4PzB5JB3RM0bjhXf1bBf2TYEslffS5OhCZma5usVB4+ZFF7fFkSnl3rUWWrteLzbw/EKbyz3q1Zy+I/WZc4i5+vmMiP6fQx6b8SH26sUDbUA0o/wRCrthQ6UtGNIdtF3iGrwd8aZ1oXOudTrXlG2307LlXqRAQ0SW94R7lt+dp7+RWuBl nBBBH+gJ ZpZ4hfW2J//u8xiOaDB6jNKm0G45q6ucMbYOnxnBq/kIZueKmKzxs2lvJ3gplKSJFszhQOW+vTyjgGm9QZ/VM6Pn/wMuffJEVa0LIdF/3DBfnrboZwrHUePfC/7+GwZXlQtsmvEA1LbTdV0XFcoofQUVYcWyD1/t9PLFfRyY9M0z1tWlZCLZ8hRThVcSqS8wzZRtnpWbv0UQTJgIqwcxioIpHgpssAThJRP/DOJUA8FPrgSiPb2o6dI61I/YKQ47KaCZgT2W839Xgppk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Asier Gutierrez This patch exposes a new file in memory cgroups: memory.thp_defrag, which follows the /sys/kernel/mm/transparent_hugepage/defrag style. Support for different defrag THP defrag policies for memory cgroups were also added. Signed-off-by: Asier Gutierrez Signed-off-by: Anatoly Stepanov Reviewed-by: Alexander Kozhevnikov --- include/linux/huge_mm.h | 8 +++ include/linux/memcontrol.h | 4 +- mm/huge_memory.c | 116 ++++++++++++++++++++++--------------- mm/memcontrol.c | 31 ++++++++++ 4 files changed, 112 insertions(+), 47 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index f99ac9b7e5bc..177c7d3578ed 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -56,6 +56,12 @@ enum transparent_hugepage_flag { #define HUGEPAGE_FLAGS_ENABLED_MASK ((1UL << TRANSPARENT_HUGEPAGE_FLAG) |\ (1UL << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)) +#define HUGEPAGE_FLAGS_DEFRAG_MASK ((1UL << TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG) |\ + (1UL << TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG) |\ + (1UL << TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG) |\ + (1UL << TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG) |\ + (1UL << TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG)) + struct kobject; struct kobj_attribute; @@ -442,7 +448,9 @@ bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct folio *folio); int thp_enabled_parse(const char *buf, unsigned long *flags); +int thp_defrag_parse(const char *buf, unsigned long *flags); const char *thp_enabled_string(unsigned long flags); +const char *thp_defrag_string(unsigned long flags); #else /* CONFIG_TRANSPARENT_HUGEPAGE */ static inline bool folio_test_pmd_mappable(struct folio *folio) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d78318782af8..a0edf15b3a07 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1634,9 +1634,11 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages, void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages); #ifdef CONFIG_TRANSPARENT_HUGEPAGE int memory_thp_enabled_show(struct seq_file *m, void *v); +int memory_thp_defrag_show(struct seq_file *m, void *v); ssize_t memory_thp_enabled_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off); - +ssize_t memory_thp_defrag_write(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off); int mem_cgroup_thp_flags_update_all(unsigned long flags, unsigned long mask); unsigned long memcg_get_thp_flags_all(unsigned long mask); unsigned long memcg_get_thp_flags(struct vm_area_struct *vma); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index fdffdfc8605c..6e1886b220d9 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -311,6 +311,28 @@ const char *thp_enabled_string(unsigned long flags) return output; } +const char *thp_defrag_string(unsigned long flags) +{ + const char *output; + + if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, + &flags)) + output = "[always] defer defer+madvise madvise never"; + else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, + &flags)) + output = "always [defer] defer+madvise madvise never"; + else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, + &flags)) + output = "always defer [defer+madvise] madvise never"; + else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, + &flags)) + output = "always defer defer+madvise [madvise] never"; + else + output = "always defer defer+madvise madvise [never]"; + + return output; +} + int thp_enabled_parse(const char *buf, unsigned long *flags) { if (sysfs_streq(buf, "always")) { @@ -328,6 +350,39 @@ int thp_enabled_parse(const char *buf, unsigned long *flags) return 0; } +int thp_defrag_parse(const char *buf, unsigned long *flags) +{ + if (sysfs_streq(buf, "always")) { + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, flags); + set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, flags); + } else if (sysfs_streq(buf, "defer+madvise")) { + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, flags); + set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, flags); + } else if (sysfs_streq(buf, "defer")) { + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, flags); + set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, flags); + } else if (sysfs_streq(buf, "madvise")) { + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, flags); + set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, flags); + } else if (sysfs_streq(buf, "never")) { + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, flags); + } else + return -EINVAL; + + return 0; +} + #ifdef CONFIG_SYSFS static ssize_t enabled_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -394,60 +449,29 @@ ssize_t single_hugepage_flag_store(struct kobject *kobj, static ssize_t defrag_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { - const char *output; - - if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, - &transparent_hugepage_flags)) - output = "[always] defer defer+madvise madvise never"; - else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, - &transparent_hugepage_flags)) - output = "always [defer] defer+madvise madvise never"; - else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, - &transparent_hugepage_flags)) - output = "always defer [defer+madvise] madvise never"; - else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, - &transparent_hugepage_flags)) - output = "always defer defer+madvise [madvise] never"; - else - output = "always defer defer+madvise madvise [never]"; - - return sysfs_emit(buf, "%s\n", output); + unsigned long flags = transparent_hugepage_flags; + return sysfs_emit(buf, "%s\n", thp_defrag_string(flags)); } static ssize_t defrag_store(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count) { - if (sysfs_streq(buf, "always")) { - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); - set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); - } else if (sysfs_streq(buf, "defer+madvise")) { - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); - set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); - } else if (sysfs_streq(buf, "defer")) { - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); - set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); - } else if (sysfs_streq(buf, "madvise")) { - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); - set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); - } else if (sysfs_streq(buf, "never")) { - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); - } else - return -EINVAL; + ssize_t ret = count; + int err; - return count; + ret = thp_defrag_parse(buf, &transparent_hugepage_flags) ? : count; + if (ret > 0 && IS_ENABLED(CONFIG_MEMCG) && + !mem_cgroup_disabled()) { + err = mem_cgroup_thp_flags_update_all(transparent_hugepage_flags, + HUGEPAGE_FLAGS_DEFRAG_MASK); + if (err) + ret = err; + } + + return ret; } + static struct kobj_attribute defrag_attr = __ATTR_RW(defrag); static ssize_t use_zero_page_show(struct kobject *kobj, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 938e6894c0b3..53384f0a69af 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3706,6 +3706,8 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) #ifdef CONFIG_TRANSPARENT_HUGEPAGE_MADVISE (1<thp_anon_orders_inherit, BIT(PMD_ORDER)); #endif @@ -4490,6 +4492,30 @@ ssize_t memory_thp_enabled_write(struct kernfs_open_file *of, char *buf, mutex_unlock(&memcg_thp_flags_mutex); return ret; } + +int memory_thp_defrag_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + unsigned long flags = READ_ONCE(memcg->thp_flags); + + seq_printf(m, "%s\n", thp_defrag_string(flags)); + return 0; +} + +ssize_t memory_thp_defrag_write(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + int ret = nbytes; + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + buf = strstrip(buf); + + mutex_lock(&memcg_thp_flags_mutex); + ret = thp_defrag_parse(buf, &memcg->thp_flags) ? : nbytes; + mutex_unlock(&memcg_thp_flags_mutex); + + return ret; +} #endif static struct cftype memory_files[] = { @@ -4566,6 +4592,11 @@ static struct cftype memory_files[] = { .seq_show = memory_thp_enabled_show, .write = memory_thp_enabled_write, }, + { + .name = "thp_defrag", + .seq_show = memory_thp_defrag_show, + .write = memory_thp_defrag_write, + }, #endif { } /* terminate */ };