From patchwork Thu Apr 6 09:14:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shaun Tancheff X-Patchwork-Id: 13203030 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EC80C76196 for ; Thu, 6 Apr 2023 09:15:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD1F26B0075; Thu, 6 Apr 2023 05:15:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C82736B0078; Thu, 6 Apr 2023 05:15:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B24106B007B; Thu, 6 Apr 2023 05:15:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9FD436B0075 for ; Thu, 6 Apr 2023 05:15:12 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6C7671C6C69 for ; Thu, 6 Apr 2023 09:15:12 +0000 (UTC) X-FDA: 80650407264.01.540B432 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf18.hostedemail.com (Postfix) with ESMTP id B26271C0005 for ; Thu, 6 Apr 2023 09:15:10 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="pK/TUbF9"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of shaun.tancheff@gmail.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=shaun.tancheff@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680772510; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=ipUTWGxdYRj74U+3DQNtSBrS5nB5CEomlPaYC6O7z0k=; b=eYuuT6fg80SRnTtXVrSVcORV7YSzJJRShoPw7hLuMlj765U0Y8/pgtNoes12KzigkjUep0 ks1YF8BuJ2NkaUcNTDGLcEus4E9vWkkbPoynXJQfhz37zSu3Op8JmCVw1VbuY4dE1oNu97 Hhjdxgtp8iyQKthwjH4Ifw8ZABrXDIw= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="pK/TUbF9"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of shaun.tancheff@gmail.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=shaun.tancheff@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680772510; a=rsa-sha256; cv=none; b=YWxsgp8ZdNtgNVxZqvixvpmVlJcoBrZUl97VIf8pXaM2dIQnlJpGSHs/+aOryjj3MEAaxb jhAU0l5hFeW/PrAu3tEC+4SAE4pqTKYOvifu7lmWh5guy6t5+qwnAZ6dedivUpqpFMNmS+ CeoseUzKmySAfe0b9t3oH2+xjAGQoPk= Received: by mail-pj1-f41.google.com with SMTP id mp3-20020a17090b190300b0023fcc8ce113so42250711pjb.4 for ; Thu, 06 Apr 2023 02:15:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680772509; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ipUTWGxdYRj74U+3DQNtSBrS5nB5CEomlPaYC6O7z0k=; b=pK/TUbF9znkVQNFvKpzD8pRU/hqSACTv3jLy5n+Vt1LkSUM50jTDWJTsTz55PeX/f4 ZKkHVHIalzFZnVrmuI2CrAd/hFqi2HP7zk/rSwt8yqMpuYkPRVswuCz+BtACRsB2EfWT n49yDrGrQqSTtyDK7FoW2zjaAjOhbiXOdvCWtXYakbayY5F1ovxk/Zdo6TKVhewBEyBs R/Exit9+ZqDLQpovjLZzH1L6dXnRFA81L8x64l4os+1Rbfu3RnfmT0WfCDe5nKhRdPwt 5l0OEkpliXZm3BiHEp2HpaSjFLR0UO6J/DUl2mMQtvyrxogsuwofUcbjQa7zus+Nel3m hnxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680772509; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ipUTWGxdYRj74U+3DQNtSBrS5nB5CEomlPaYC6O7z0k=; b=e/DSFc1pZeWRR4oj5K3GCmiFxsJJ2b1XCucpXEMKicLak7VnL0vgWlRnYL2WShHV0P clq/1L7G6vVGnaEPq7BY+00dj0SkvvkHjiTwZ5HeJj1MfFVujjNUTWfGrL4G5bE4szEG whH+LvIzSpuPfaA49I/puJldHouMqeDEzAa9tNLkm0wiCbI6GZLGoEvfUXPS/Xh+vcl+ +UnAsEmBpql1Je7bfmpyoKxeaPfkwNSFY2HAIip1h48uFBAOYggTTj1KvTnh734zmkVu IemT7Npqgbta1HCU8FYGIr7FhuLJ5nFjWy+gqJDvG7ssbJfIexrT8OtkVwd5SxazmgnD 3t8Q== X-Gm-Message-State: AAQBX9fobHAJyXvzcE7GtvWtfgqEdaBWLwMtZWT4p5dVQW86XR3VscSP ZcItUPvBX4dnybWlwWB5kV8= X-Google-Smtp-Source: AKy350YoGoW2oRvGjOsx4m02H8L2jRfm606YF08ZFZ+hJKJPrWIpP8pJxUFHyb5P3hCVcz1kIHPC7Q== X-Received: by 2002:a17:90b:3907:b0:233:f354:e7df with SMTP id ob7-20020a17090b390700b00233f354e7dfmr10622120pjb.18.1680772509060; Thu, 06 Apr 2023 02:15:09 -0700 (PDT) Received: from lunar.aeonazure.com ([182.2.142.228]) by smtp.gmail.com with ESMTPSA id nh12-20020a17090b364c00b0023d16f05dd8sm760718pjb.36.2023.04.06.02.15.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Apr 2023 02:15:08 -0700 (PDT) From: Shaun Tancheff To: Johannes Weiner , Michal Hocko , Vladimir Davydov Cc: Shaun Tancheff , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH] memcg: Default value setting in memcg-v1 Date: Thu, 6 Apr 2023 16:14:50 +0700 Message-Id: <20230406091450.167779-1-shaun.tancheff@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: B26271C0005 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 58h9qwjuwz8ccifids8mxxzagmej6ztd X-HE-Tag: 1680772510-658365 X-HE-Meta: U2FsdGVkX1+BY2rcMa9Z4nJNAzC8P8hkJkEF83dwItyGXJNWxcSeOSKtHFNKmt6LD6P7TXasgRbxh+hN2PHpq8TASFGnrfkDldUllwCdAEtDGLIQlmdlQTd3+QRdQ0kHn/Wr1TBYmvuzRtmG75IfvIrCkanLm9a9f8eNCFIpJieFEv3QKd9kZyG/b9kTvJ4rATsWq+KFRWfsOVGElEO8CCJAZGKI5Ouf0knpQH0f9/pZ61yAv2D2aFsewpGgo6m+gv6kEhDA4jXecYsODbW2Ksp6IGDjlt5pqZg6n26Cx3gZYNI/3ryfXclBsZGGEGuO56fx4hbVWXG1navgAAZknT+8yV2qTxjgoqn7bO5ijjV4h2RkrV72qT6FpTcXsf5x/cAL1YilbLFBZ2iRWJVswyqgX6FH4J0DK6iap1zIZO0pmoWgpBiXQV9mh5dy2R+f0l5Wqc+HPxYuG2exJxkYqr8D7YsshIOP4umgo6xCfUVXlu9Gm+aQVoLcHAh/z/1PbwWTx4QYxpaPZVocuN3ZbTEvJb0Ue3v/q6R93V7pz0oDz3tryxVsXtQY5unqEx/QoGfBkdIdNRJgcYuAyAW9ISph4Rj+yOjqqZDxhg1LHhyG+y9uCb5zWF800VkohX8L3QVc2+WPkSX/jfU1LCpzmUVqQMxBrvGaMOa3WlhR067Y1PP267W/ra+vUrgaE6wuyCkFq6nXEabL4cOv4F8iV2uyozm0JfazVltNCgQ3sy6m99M2V+37NFnkW64fnlM+QQBV/t/y29oDcS+wzp2xtaqJFQkdiAzDIgFgNhZzMfBQLiaP/LvOEGaFi8U2ATr/XH2w2K64O9PW18MzrPhcYTyhAoDPY1o2zojTeOMOiNB/BlZGqhd7WBIPLmpUvAMBUFQbhBzb/AklPZL6daZrMmAcA9xYW6ePooAjEMs/YcKwkKffnowaef4Miu4X/bYOdeXhC+QH0Arn6MVFS2k 38yNDTc8 63urAmre53SUGi0L95Fith4Icew86FKPLgsBmaDT1U9w+gFma0hrU1d576/NRWf0Tstic2sdz0d/3+xuRuWTRXbZPIsNOM7Z+9rQ3FeywhCSBc4PSlVUhs6tcz/KuQtfDtlTgt6YcZqS6WXpwsHWvgSJsTT5bRsyngQWX0Zgw5hwQTnHRbJil/guB9i4me7jmHnCWKk7UUFnqyms3lXiy120jue0TExdqxh3S/+kwXS4OpkHy7h46TsZToede8nwqa2y6m+9JNDSGn4SwZNpsk3CO5KSGORWtgo9h7ktknghXDK4Br7LJixSTeerMBcKs+1OgC4dGhkcyKyEZhCep3nVy3vYf8t37ljtQ1paiMgwiWyRoaAGMlGoD5WyMGdd/TgPOE+wqowCqzyX83FI+3EbEgGoUAifr60WGakAwCJB5uZiKaTYIJkG18oEAGiopDNXVGHrTn99cTZvZBhMAzCG8af+5UjfW1snyf/lg+EUwh4Xb28IaH/71G4GdTTaGJ77PBM8Sl4VwOnIcrujPMDiHqJvUyYJuRlDyDHOlmh12KJ2eMI5SCpkxs2UOjCtAmUvmincaPaQsj3SJqLP5BygadzsWv/kXmRb0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Shaun Tancheff Setting min, low and high values with memcg-v1 provides bennefits for users that are unable to update to memcg-v2. Setting min, low and high can be set in memcg-v1 to apply enough memory pressure to effective throttle filesystem I/O without hitting memcg oom. This can be enabled by setting the sysctl values: vm.memcg_v1_min_default vm.memcg_v1_low_default vm.memcg_v1_high_default When a memory control group is newly crated the min, low and high values are set to percent of the maximum based on the min, low and high default values respectively. This resolves an issue with memory pressure when users initiate unbounded I/O on various file systems such as ext4, XFS and NFS. Signed-off-by: Shaun Tancheff --- v0: Initial hard coded limits by percent. v1: Added sysfs access and module parameters for percent values to enable v2: Fix 32-bit, remove need for missing __udivdi3 v3: Added sysctl parameters and documentation .../admin-guide/cgroup-v1/memory.rst | 33 +++++++++ Documentation/admin-guide/sysctl/vm.rst | 33 +++++++++ include/linux/memcontrol.h | 5 ++ kernel/sysctl.c | 29 ++++++++ mm/memcontrol.c | 69 ++++++++++++++++++- 5 files changed, 168 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index 258e45cc3b2d..4b44e0da49d6 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -70,6 +70,15 @@ Brief summary of control files. memory.memsw.usage_in_bytes show current usage for memory+Swap (See 5.5 for details) memory.limit_in_bytes set/show limit of memory usage + memory.limit_in_bytes.min show current memory min setting not present + on the root control group. + (See sysctl's vm.memcg_v1_min_default) + memory.limit_in_bytes.low show current memory low setting not present + on the root control group. + (See sysctl's vm.memcg_v1_low_default) + memory.limit_in_bytes.high show current memory low setting not present + on the root control group. + (See sysctl's vm.memcg_v1_high_default) memory.memsw.limit_in_bytes set/show limit of memory+Swap usage memory.failcnt show the number of memory usage hits limits memory.memsw.failcnt show the number of memory+Swap hits limits @@ -648,6 +657,30 @@ The output format of memory.numa_stat is:: The "total" count is sum of file + anon + unevictable. +5.6 limit_in_bytes.low, min, and high +------------------------------------- + +These read-only values enable viewing the current low, min and high +restrictions added to a newly created cgroup when the sysctl vm +parameters: vm.memcg_v1_low_default, vm.memcg_v1_min_default, +and vm.memcg_v1_high_default are enabled. + +Example usage: + sudo sysctl -w vm.memcg_v1_min_default=10 + sudo sysctl -w vm.memcg_v1_low_default=30 + sudo sysctl -w vm.memcg_v1_high_default=80 + + sudo mkdir /sys/fs/cgroup/memory/restrict + echo 100M | sudo tee /sys/fs/cgroup/memory/restrict/memory.limit_in_bytes + cat /sys/fs/cgroup/memory/restrict/memory.limit_in_bytes.min + 2560 + cat /sys/fs/cgroup/memory/restrict/memory.limit_in_bytes.low + 7680 + cat /sys/fs/cgroup/memory/restrict/memory.limit_in_bytes.high + 20480 + echo $$ | sudo tee /sys/fs/cgroup/memory/restrict/tasks + dd if=/dev/zero of=~/file.bin bs=10M status=progress + 6. Hierarchy support ==================== diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 988f6a4c8084..87eefa165f92 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -43,6 +43,9 @@ Currently, these files are in /proc/sys/vm: - legacy_va_layout - lowmem_reserve_ratio - max_map_count +- memcg_v1_high_default +- memcg_v1_low_default +- memcg_v1_min_default - memory_failure_early_kill - memory_failure_recovery - min_free_kbytes @@ -425,6 +428,36 @@ e.g., up to one or two maps per allocation. The default value is 65530. +memcg_v1_min_default: +===================== + +This file contains a percentage of the cgroup memory limit used to +set the min value of a newly memory cgroup. This value is only used +with memory cgroup v1 interface. + +The default is 0 (disabled). Range is [0, 100]. + + +memcg_v1_low_default: +===================== + +This file contains a percentage of the cgroup memory limit used to +set the low value of a newly memory cgroup. This value is only used +with memory cgroup v1 interface. + +The default is 0 (disabled). Range is [0, 100]. + + +memcg_v1_high_default: +====================== + +This file contains a percentage of the cgroup memory limit used to +set the high value of a newly memory cgroup. This value is only used +with memory cgroup v1 interface. + +The default is 0 (disabled). Range is [0, 100]. + + memory_failure_early_kill: ========================== diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 85dc9b88ea37..0592b5e19883 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -68,6 +68,11 @@ struct mem_cgroup_id { refcount_t ref; }; +/* System default memory protection setting */ +extern int sysctl_memcg_min_default; +extern int sysctl_memcg_low_default; +extern int sysctl_memcg_high_default; + /* * Per memcg event counter is incremented at every pagein/pageout. With THP, * it will be incremented by the number of pages. This counter is used diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 1c240d2c99bc..bf923e50e597 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2449,6 +2449,35 @@ static struct ctl_table vm_table[] = { .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, +#endif +#ifdef CONFIG_MEMCG + { + .procname = "memcg_v1_min_default", + .data = &sysctl_memcg_min_default, + .maxlen = sizeof(sysctl_memcg_min_default), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE_HUNDRED, + }, + { + .procname = "memcg_v1_low_default", + .data = &sysctl_memcg_low_default, + .maxlen = sizeof(sysctl_memcg_low_default), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE_HUNDRED, + }, + { + .procname = "memcg_v1_high_default", + .data = &sysctl_memcg_high_default, + .maxlen = sizeof(sysctl_memcg_high_default), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE_HUNDRED, + }, #endif { } }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2eee092f8f11..74875178b48b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -82,6 +82,11 @@ struct mem_cgroup *root_mem_cgroup __read_mostly; DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg); EXPORT_PER_CPU_SYMBOL_GPL(int_active_memcg); +/* System default memory protection setting */ +int sysctl_memcg_min_default __read_mostly = 0; +int sysctl_memcg_low_default __read_mostly = 0; +int sysctl_memcg_high_default __read_mostly = 0; + /* Socket memory accounting disabled? */ static bool cgroup_memory_nosocket __ro_after_init; @@ -205,6 +210,7 @@ enum res_type { _MEMSWAP, _KMEM, _TCP, + _MEM_V1, }; #define MEMFILE_PRIVATE(x, val) ((x) << 16 | (val)) @@ -3676,6 +3682,9 @@ enum { RES_MAX_USAGE, RES_FAILCNT, RES_SOFT_LIMIT, + RES_LIMIT_MIN, + RES_LIMIT_LOW, + RES_LIMIT_HIGH, }; static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, @@ -3686,6 +3695,7 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, switch (MEMFILE_TYPE(cft->private)) { case _MEM: + case _MEM_V1: counter = &memcg->memory; break; case _MEMSWAP: @@ -3716,6 +3726,12 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, return counter->failcnt; case RES_SOFT_LIMIT: return (u64)memcg->soft_limit * PAGE_SIZE; + case RES_LIMIT_MIN: + return (u64)READ_ONCE(memcg->memory.min); + case RES_LIMIT_LOW: + return (u64)READ_ONCE(memcg->memory.low); + case RES_LIMIT_HIGH: + return (u64)READ_ONCE(memcg->memory.high); default: BUG(); } @@ -3815,6 +3831,34 @@ static int memcg_update_tcp_max(struct mem_cgroup *memcg, unsigned long max) return ret; } +static inline void mem_cgroup_v1_set_defaults(struct mem_cgroup *memcg, + unsigned long nr_pages) +{ + unsigned long min, low, high; + + if (mem_cgroup_is_root(memcg) || PAGE_COUNTER_MAX == nr_pages) + return; + + min = READ_ONCE(memcg->memory.min); + low = READ_ONCE(memcg->memory.low); + if (min || low) + return; + + if (!min && sysctl_memcg_min_default > 0) { + min = (nr_pages * sysctl_memcg_min_default) / 100; + page_counter_set_min(&memcg->memory, min); + } + if (!low && sysctl_memcg_low_default > 0) { + low = (nr_pages * sysctl_memcg_low_default) / 100; + page_counter_set_low(&memcg->memory, low); + } + high = READ_ONCE(memcg->memory.high); + if (high == PAGE_COUNTER_MAX && sysctl_memcg_high_default) { + high = (nr_pages * sysctl_memcg_high_default) / 100; + page_counter_set_high(&memcg->memory, high); + } +} + /* * The user of this function is... * RES_LIMIT. @@ -3838,6 +3882,11 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of, break; } switch (MEMFILE_TYPE(of_cft(of)->private)) { + case _MEM_V1: + ret = mem_cgroup_resize_max(memcg, nr_pages, false); + if (!ret) + mem_cgroup_v1_set_defaults(memcg, nr_pages); + break; case _MEM: ret = mem_cgroup_resize_max(memcg, nr_pages, false); break; @@ -5000,10 +5049,28 @@ static struct cftype mem_cgroup_legacy_files[] = { }, { .name = "limit_in_bytes", - .private = MEMFILE_PRIVATE(_MEM, RES_LIMIT), + .private = MEMFILE_PRIVATE(_MEM_V1, RES_LIMIT), .write = mem_cgroup_write, .read_u64 = mem_cgroup_read_u64, }, + { + .name = "limit_in_bytes.min", + .private = MEMFILE_PRIVATE(_MEM_V1, RES_LIMIT_MIN), + .read_u64 = mem_cgroup_read_u64, + .flags = CFTYPE_NOT_ON_ROOT, + }, + { + .name = "limit_in_bytes.low", + .private = MEMFILE_PRIVATE(_MEM_V1, RES_LIMIT_LOW), + .read_u64 = mem_cgroup_read_u64, + .flags = CFTYPE_NOT_ON_ROOT, + }, + { + .name = "limit_in_bytes.high", + .private = MEMFILE_PRIVATE(_MEM_V1, RES_LIMIT_HIGH), + .read_u64 = mem_cgroup_read_u64, + .flags = CFTYPE_NOT_ON_ROOT, + }, { .name = "soft_limit_in_bytes", .private = MEMFILE_PRIVATE(_MEM, RES_SOFT_LIMIT),