From patchwork Wed Oct 30 08:33:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gutierrez Asier X-Patchwork-Id: 13856117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9E87D7497F for ; Wed, 30 Oct 2024 08:33:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9BF208D0006; Wed, 30 Oct 2024 04:33:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 946888D0007; Wed, 30 Oct 2024 04:33:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C0528D0006; Wed, 30 Oct 2024 04:33:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5437B8D0007 for ; Wed, 30 Oct 2024 04:33:47 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 0C44C1A0626 for ; Wed, 30 Oct 2024 08:33:47 +0000 (UTC) X-FDA: 82729603530.20.8EF01B3 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf03.hostedemail.com (Postfix) with ESMTP id 4E32B20004 for ; Wed, 30 Oct 2024 08:33:33 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=none; spf=pass (imf03.hostedemail.com: domain of gutierrez.asier@huawei-partners.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=gutierrez.asier@huawei-partners.com; dmarc=pass (policy=quarantine) header.from=huawei-partners.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730277170; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oan/HeFzdTboOvssPG2t6eHxUD+6OwZgk6v31DShWaY=; b=njgKbPhJ5lAPhnAkixdacaXEcVODblaFFYTzuG7ZO10uZhq/OtFRpYeov5pXQ7VzvlWpko bxTFAo0dSHFchm3BOAv3jd0LPuhZxksT2fzxQ2Fbb43/UNPlqYB8YawkYwF/Xm+uGaibbP 8JXbSDFua9v1ZxVbVSjLAVWb3m6q1TY= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=none; spf=pass (imf03.hostedemail.com: domain of gutierrez.asier@huawei-partners.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=gutierrez.asier@huawei-partners.com; dmarc=pass (policy=quarantine) header.from=huawei-partners.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730277170; a=rsa-sha256; cv=none; b=L8HJUdAew5E0EqGTmNvaKnstv0+hZjwMFiFeQEW6EvbxLKdFzvppKVBtRvHMOBPkgchHSm 3p/jEQE18+rEJ66g1qm4EEkhtobMV6qCzYgNdLuCn74OWDlclRs6TaYyOCxcAOvXByceGx KTui7ll33//qHKiHDr8UF2OBCQc0g5w= Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4XdgJL2H7Pz6GFtq; Wed, 30 Oct 2024 16:28:50 +0800 (CST) Received: from mscpeml500003.china.huawei.com (unknown [7.188.49.51]) by mail.maildlp.com (Postfix) with ESMTPS id 0FB3A140CB9; Wed, 30 Oct 2024 16:33:41 +0800 (CST) Received: from mscphis01197.huawei.com (10.123.65.218) by mscpeml500003.china.huawei.com (7.188.49.51) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.34; Wed, 30 Oct 2024 11:33:40 +0300 From: To: , , , , , , , , , , CC: , , , , , , , , , , , Subject: [RFC PATCH 3/3] mm: Add thp_defrag control for cgroup Date: Wed, 30 Oct 2024 16:33:11 +0800 Message-ID: <20241030083311.965933-4-gutierrez.asier@huawei-partners.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241030083311.965933-1-gutierrez.asier@huawei-partners.com> References: <20241030083311.965933-1-gutierrez.asier@huawei-partners.com> MIME-Version: 1.0 X-Originating-IP: [10.123.65.218] X-ClientProxiedBy: mscpeml500003.china.huawei.com (7.188.49.51) To mscpeml500003.china.huawei.com (7.188.49.51) X-Rspam-User: X-Stat-Signature: 7rnaqn6ugz71jwr33d99ssd39mptax1k X-Rspamd-Queue-Id: 4E32B20004 X-Rspamd-Server: rspam11 X-HE-Tag: 1730277213-149033 X-HE-Meta: U2FsdGVkX1+roVORsBuJGFUHNaGVq8L8X+B8wUXil5jG892gwT+pLjPx6te6oOITRPCUNQPAUQWTq+SMAFDaVeAvhlI6U5VMh3zloxK4hqXe1wsxH5zDEK8tfyYNxtGgNeOpg3n3NvpN9Zq3tv212QOnBXSUhBovBGcbK35OcciyDn59SaIcNF/+N3aAmcxLWZoW5za1N2yZ23c13xT89SkDnB2WmYi+Kv50gzp8+z/nDw9iRnE9f1NMiy3gXFe06MoX9qpY5DtfiGY6bQtoaiO03ANvvNpUGAp05SyaaXihwxaqjN3h7JgAcNAhxQUy9c5aPTDFRr95bskMFWPpu/egjbWBEDbez7UFCHZQGcRx9SfE6C0YRRcNY0GHvj1OqSx64+YxNLckd62AlgsSxSSsSMuuvLX/1pfzGG9nsO54OEtiLPTJe8UosO23Qk3d5TUC3VWYWhJ+XT6oafxOUJtWD4s+0f6g2TNQhY7FLIU2lBg1nMg1TWKxIvM1r0wm5SxO+n2vRxLvXzqf8smw3YFooyr8aTy3u/RGGK/lU7/W5/QpVho0qrx3vuFgQpCtzhUPFjL66BHGwTlMFQsiaHQt+g+0srv8O/rpx53Wc3axWxyeuHa3THwMR8wtbnWl8ROoZAZ3L35iXhnlevANTDNeWQJsaaUKNXeH9BmvN74tyJDE9p2h+sv7XyV8RRdFU+s6g7T3LOgqXHhiaP5Hc2eeXjep62htCI5Z7l43uw7bytufzSEHAEx/QCE2dkJbuYZ5nHZSIHCR8kXRsBDqy04CrvVK7EC05D4PzB5JB3RM0bjhXf1bBf2TYEslffS5OhCZma5usVB4+ZFF7fFkSnl3rUWWrteLzbw/EKbyz3q1Zy+I/WZc4i5+vmMiP6fQx6b8SH26sUDbUA0o/wRCrthQ6UtGNIdtF3iGrwd8aZ1oXOudTrXlG2307LlXqRAQ0SW94R7lt+dp7+RWuBl nBBBH+gJ ZpZ4hfW2J//u8xiOaDB6jNKm0G45q6ucMbYOnxnBq/kIZueKmKzxs2lvJ3gplKSJFszhQOW+vTyjgGm9QZ/VM6Pn/wMuffJEVa0LIdF/3DBfnrboZwrHUePfC/7+GwZXlQtsmvEA1LbTdV0XFcoofQUVYcWyD1/t9PLFfRyY9M0z1tWlZCLZ8hRThVcSqS8wzZRtnpWbv0UQTJgIqwcxioIpHgpssAThJRP/DOJUA8FPrgSiPb2o6dI61I/YKQ47KaCZgT2W839Xgppk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Asier Gutierrez This patch exposes a new file in memory cgroups: memory.thp_defrag, which follows the /sys/kernel/mm/transparent_hugepage/defrag style. Support for different defrag THP defrag policies for memory cgroups were also added. Signed-off-by: Asier Gutierrez Signed-off-by: Anatoly Stepanov Reviewed-by: Alexander Kozhevnikov --- include/linux/huge_mm.h | 8 +++ include/linux/memcontrol.h | 4 +- mm/huge_memory.c | 116 ++++++++++++++++++++++--------------- mm/memcontrol.c | 31 ++++++++++ 4 files changed, 112 insertions(+), 47 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index f99ac9b7e5bc..177c7d3578ed 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -56,6 +56,12 @@ enum transparent_hugepage_flag { #define HUGEPAGE_FLAGS_ENABLED_MASK ((1UL << TRANSPARENT_HUGEPAGE_FLAG) |\ (1UL << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)) +#define HUGEPAGE_FLAGS_DEFRAG_MASK ((1UL << TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG) |\ + (1UL << TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG) |\ + (1UL << TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG) |\ + (1UL << TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG) |\ + (1UL << TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG)) + struct kobject; struct kobj_attribute; @@ -442,7 +448,9 @@ bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct folio *folio); int thp_enabled_parse(const char *buf, unsigned long *flags); +int thp_defrag_parse(const char *buf, unsigned long *flags); const char *thp_enabled_string(unsigned long flags); +const char *thp_defrag_string(unsigned long flags); #else /* CONFIG_TRANSPARENT_HUGEPAGE */ static inline bool folio_test_pmd_mappable(struct folio *folio) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d78318782af8..a0edf15b3a07 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1634,9 +1634,11 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages, void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages); #ifdef CONFIG_TRANSPARENT_HUGEPAGE int memory_thp_enabled_show(struct seq_file *m, void *v); +int memory_thp_defrag_show(struct seq_file *m, void *v); ssize_t memory_thp_enabled_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off); - +ssize_t memory_thp_defrag_write(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off); int mem_cgroup_thp_flags_update_all(unsigned long flags, unsigned long mask); unsigned long memcg_get_thp_flags_all(unsigned long mask); unsigned long memcg_get_thp_flags(struct vm_area_struct *vma); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index fdffdfc8605c..6e1886b220d9 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -311,6 +311,28 @@ const char *thp_enabled_string(unsigned long flags) return output; } +const char *thp_defrag_string(unsigned long flags) +{ + const char *output; + + if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, + &flags)) + output = "[always] defer defer+madvise madvise never"; + else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, + &flags)) + output = "always [defer] defer+madvise madvise never"; + else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, + &flags)) + output = "always defer [defer+madvise] madvise never"; + else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, + &flags)) + output = "always defer defer+madvise [madvise] never"; + else + output = "always defer defer+madvise madvise [never]"; + + return output; +} + int thp_enabled_parse(const char *buf, unsigned long *flags) { if (sysfs_streq(buf, "always")) { @@ -328,6 +350,39 @@ int thp_enabled_parse(const char *buf, unsigned long *flags) return 0; } +int thp_defrag_parse(const char *buf, unsigned long *flags) +{ + if (sysfs_streq(buf, "always")) { + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, flags); + set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, flags); + } else if (sysfs_streq(buf, "defer+madvise")) { + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, flags); + set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, flags); + } else if (sysfs_streq(buf, "defer")) { + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, flags); + set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, flags); + } else if (sysfs_streq(buf, "madvise")) { + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, flags); + set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, flags); + } else if (sysfs_streq(buf, "never")) { + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, flags); + clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, flags); + } else + return -EINVAL; + + return 0; +} + #ifdef CONFIG_SYSFS static ssize_t enabled_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -394,60 +449,29 @@ ssize_t single_hugepage_flag_store(struct kobject *kobj, static ssize_t defrag_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { - const char *output; - - if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, - &transparent_hugepage_flags)) - output = "[always] defer defer+madvise madvise never"; - else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, - &transparent_hugepage_flags)) - output = "always [defer] defer+madvise madvise never"; - else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, - &transparent_hugepage_flags)) - output = "always defer [defer+madvise] madvise never"; - else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, - &transparent_hugepage_flags)) - output = "always defer defer+madvise [madvise] never"; - else - output = "always defer defer+madvise madvise [never]"; - - return sysfs_emit(buf, "%s\n", output); + unsigned long flags = transparent_hugepage_flags; + return sysfs_emit(buf, "%s\n", thp_defrag_string(flags)); } static ssize_t defrag_store(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count) { - if (sysfs_streq(buf, "always")) { - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); - set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); - } else if (sysfs_streq(buf, "defer+madvise")) { - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); - set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); - } else if (sysfs_streq(buf, "defer")) { - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); - set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); - } else if (sysfs_streq(buf, "madvise")) { - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); - set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); - } else if (sysfs_streq(buf, "never")) { - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); - clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); - } else - return -EINVAL; + ssize_t ret = count; + int err; - return count; + ret = thp_defrag_parse(buf, &transparent_hugepage_flags) ? : count; + if (ret > 0 && IS_ENABLED(CONFIG_MEMCG) && + !mem_cgroup_disabled()) { + err = mem_cgroup_thp_flags_update_all(transparent_hugepage_flags, + HUGEPAGE_FLAGS_DEFRAG_MASK); + if (err) + ret = err; + } + + return ret; } + static struct kobj_attribute defrag_attr = __ATTR_RW(defrag); static ssize_t use_zero_page_show(struct kobject *kobj, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 938e6894c0b3..53384f0a69af 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3706,6 +3706,8 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) #ifdef CONFIG_TRANSPARENT_HUGEPAGE_MADVISE (1<thp_anon_orders_inherit, BIT(PMD_ORDER)); #endif @@ -4490,6 +4492,30 @@ ssize_t memory_thp_enabled_write(struct kernfs_open_file *of, char *buf, mutex_unlock(&memcg_thp_flags_mutex); return ret; } + +int memory_thp_defrag_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + unsigned long flags = READ_ONCE(memcg->thp_flags); + + seq_printf(m, "%s\n", thp_defrag_string(flags)); + return 0; +} + +ssize_t memory_thp_defrag_write(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + int ret = nbytes; + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + buf = strstrip(buf); + + mutex_lock(&memcg_thp_flags_mutex); + ret = thp_defrag_parse(buf, &memcg->thp_flags) ? : nbytes; + mutex_unlock(&memcg_thp_flags_mutex); + + return ret; +} #endif static struct cftype memory_files[] = { @@ -4566,6 +4592,11 @@ static struct cftype memory_files[] = { .seq_show = memory_thp_enabled_show, .write = memory_thp_enabled_write, }, + { + .name = "thp_defrag", + .seq_show = memory_thp_defrag_show, + .write = memory_thp_defrag_write, + }, #endif { } /* terminate */ };