From patchwork Mon Nov 8 21:19:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 12609107 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 797DCC433F5 for ; Mon, 8 Nov 2021 21:20:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0379361052 for ; Mon, 8 Nov 2021 21:20:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 0379361052 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 691A66B006C; Mon, 8 Nov 2021 16:20:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6419C6B0072; Mon, 8 Nov 2021 16:20:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5313E6B0073; Mon, 8 Nov 2021 16:20:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id 4718E6B006C for ; Mon, 8 Nov 2021 16:20:07 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id EFE178249980 for ; Mon, 8 Nov 2021 21:20:06 +0000 (UTC) X-FDA: 78787030854.14.2F0B1CE Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf18.hostedemail.com (Postfix) with ESMTP id 7EBBD4001E9D for ; Mon, 8 Nov 2021 21:20:06 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id s189-20020a252cc6000000b005c1f206d91eso27104894ybs.14 for ; Mon, 08 Nov 2021 13:20:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:cc; bh=zVeQGTqO4yrZIJdGZXiUweQmgL96nl5isxRapgZhTQ8=; b=bSjwqOIRF175FXI/JaJgPUi97vnEQj89YKHRYiosJj4YHcKTNyFDssvZ/HcCV9yiWt i2XZHALoFJZvJbeu69RSuRiQBerATpSzTpx78EHqoDnh5PwVFu0KLk9RbJJG2Wd/NXtc bwqUaD0fNzk8Cif31JYb47qcaJ/JrT4cQzeTyUdD34pKyqrtQ7EWSy8HYLFpcAYzIVDM peEZjAylvtBhdz5/M6zOsnm17XjUqHzrbbD0dHBRwY1cSzp5E6lC8fHrH8clThvYvAG2 l/J28AcKZ6pIICOui9D0Y5h3R3xKoLxeM3LneLlVYxArLmvwT1TzsBSJhBT/rrQ420k7 w0kA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:cc; bh=zVeQGTqO4yrZIJdGZXiUweQmgL96nl5isxRapgZhTQ8=; b=HVsyFy2kWfK4Q0O/u09HE5ZvnJL25e5Qa2mwsXk5jgWJrOBBf7z57hElArKwwZ0a8x bPQxpBFbHD5q0gkWDaVA0GEvBTnXrl+bTxiJoAyO0ttMWozH9EQHwyFwwBT41+yfOvJ7 YtjVCCodmy+4l3wtFUYL/Jl//6Mfmm54PHFupz0I0ld94r72y1ou/ykI1yRpUPYTO+jK mUyRpMrHcNCbFbPe4xd/QIRMhp5u7UDGg6m6nzLPO+rDUfxfNISUaLaQd7Z7xbGY6qjR FdBXES7sKgUYrdMfq/pUtDs+8oh6D56OLHxYmCisJFh5LiEOCc1UsaR9MFhwscQTX37j 9bzA== X-Gm-Message-State: AOAM533+YSbbVpxMnB7KlaXkOeGJemNTisTS0hm7neSGnv/29v74tAb0 lVR1zrNo+DPfsG0ZcWwysVOjCPhNRlBpKPwt+Q== X-Google-Smtp-Source: ABdhPJzTHw6oqG6AiSf3Kr3pTcKuNjPhm96OIbGQULeQ9Rn0mcghP9uLa+I5NP9i6IULx/Pj1GRCd9wLDx6C5TKX5A== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2cd:202:8717:7707:fb59:664e]) (user=almasrymina job=sendgmr) by 2002:a05:6902:725:: with SMTP id l5mr3139301ybt.314.1636406405747; Mon, 08 Nov 2021 13:20:05 -0800 (PST) Date: Mon, 8 Nov 2021 13:19:55 -0800 In-Reply-To: <20211108211959.1750915-1-almasrymina@google.com> Message-Id: <20211108211959.1750915-2-almasrymina@google.com> Mime-Version: 1.0 References: <20211108211959.1750915-1-almasrymina@google.com> X-Mailer: git-send-email 2.34.0.rc0.344.g81b53c2807-goog Subject: [PATCH v1 1/5] mm/shmem: support deterministic charging of tmpfs From: Mina Almasry Cc: Mina Almasry , Michal Hocko , "Theodore Ts'o" , Greg Thelen , Shakeel Butt , Andrew Morton , Hugh Dickins , Roman Gushchin , Johannes Weiner , Tejun Heo , Vladimir Davydov , riel@surriel.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 7EBBD4001E9D X-Stat-Signature: sdkkko35kz147arwiikxt1uxxbgszgjw Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=bSjwqOIR; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of 3hZSJYQsKCOYITUIaZgUQVIOWWOTM.KWUTQVcf-UUSdIKS.WZO@flex--almasrymina.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3hZSJYQsKCOYITUIaZgUQVIOWWOTM.KWUTQVcf-UUSdIKS.WZO@flex--almasrymina.bounces.google.com X-HE-Tag: 1636406406-455080 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add memcg= option to shmem mount. Users can specify this option at mount time and all data page charges will be charged to the memcg supplied. Signed-off-by: Mina Almasry Cc: Michal Hocko Cc: Theodore Ts'o Cc: Greg Thelen Cc: Shakeel Butt Cc: Andrew Morton Cc: Hugh Dickins Cc: Roman Gushchin Cc: Johannes Weiner Cc: Hugh Dickins Cc: Tejun Heo Cc: Vladimir Davydov Cc: Muchun Song Cc: riel@surriel.com Cc: linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org Cc: cgroups@vger.kernel.org --- fs/super.c | 3 ++ include/linux/fs.h | 5 ++ include/linux/memcontrol.h | 46 ++++++++++++++-- mm/filemap.c | 2 +- mm/memcontrol.c | 104 ++++++++++++++++++++++++++++++++++++- mm/shmem.c | 50 +++++++++++++++++- 6 files changed, 201 insertions(+), 9 deletions(-) -- 2.34.0.rc0.344.g81b53c2807-goog diff --git a/fs/super.c b/fs/super.c index 3bfc0f8fbd5bc..8aafe5e4e6200 100644 --- a/fs/super.c +++ b/fs/super.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include /* for the emergency remount stuff */ @@ -180,6 +181,7 @@ static void destroy_unused_super(struct super_block *s) up_write(&s->s_umount); list_lru_destroy(&s->s_dentry_lru); list_lru_destroy(&s->s_inode_lru); + mem_cgroup_set_charge_target(&s->s_memcg_to_charge, NULL); security_sb_free(s); put_user_ns(s->s_user_ns); kfree(s->s_subtype); @@ -292,6 +294,7 @@ static void __put_super(struct super_block *s) WARN_ON(s->s_dentry_lru.node); WARN_ON(s->s_inode_lru.node); WARN_ON(!list_empty(&s->s_mounts)); + mem_cgroup_set_charge_target(&s->s_memcg_to_charge, NULL); security_sb_free(s); fscrypt_sb_free(s); put_user_ns(s->s_user_ns); diff --git a/include/linux/fs.h b/include/linux/fs.h index 3afca821df32e..59407b3e7aee3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1567,6 +1567,11 @@ struct super_block { struct workqueue_struct *s_dio_done_wq; struct hlist_head s_pins; +#ifdef CONFIG_MEMCG + /* memcg to charge for pages allocated to this filesystem */ + struct mem_cgroup *s_memcg_to_charge; +#endif + /* * Owning user namespace and default context in which to * interpret filesystem uids, gids, quotas, device nodes, diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0c5c403f4be6b..e9a64c1b8295b 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -27,6 +27,7 @@ struct obj_cgroup; struct page; struct mm_struct; struct kmem_cache; +struct super_block; /* Cgroup-specific page state, on top of universal node page state */ enum memcg_stat_item { @@ -689,7 +690,8 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg) page_counter_read(&memcg->memory); } -int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp); +int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp, + struct address_space *mapping); /** * mem_cgroup_charge - Charge a newly allocated folio to a cgroup. @@ -710,7 +712,7 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, { if (mem_cgroup_disabled()) return 0; - return __mem_cgroup_charge(folio, mm, gfp); + return __mem_cgroup_charge(folio, mm, gfp, NULL); } int mem_cgroup_swapin_charge_page(struct page *page, struct mm_struct *mm, @@ -923,6 +925,16 @@ static inline bool mem_cgroup_online(struct mem_cgroup *memcg) return !!(memcg->css.flags & CSS_ONLINE); } +bool is_remote_oom(struct mem_cgroup *memcg_under_oom); +void mem_cgroup_set_charge_target(struct mem_cgroup **target, + struct mem_cgroup *memcg); +struct mem_cgroup *mem_cgroup_get_from_path(const char *path); +/** + * User is responsible for providing a buffer @buf of length @len and freeing + * it. + */ +int mem_cgroup_get_name_from_sb(struct super_block *sb, char *buf, size_t len); + void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru, int zid, int nr_pages); @@ -1217,8 +1229,15 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg) return false; } -static inline int mem_cgroup_charge(struct folio *folio, - struct mm_struct *mm, gfp_t gfp) +static inline int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, + gfp_t gfp_mask, + struct address_space *mapping) +{ + return 0; +} + +static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, + gfp_t gfp_mask) { return 0; } @@ -1326,6 +1345,25 @@ static inline void mem_cgroup_iter_break(struct mem_cgroup *root, { } +static inline bool is_remote_oom(struct mem_cgroup *memcg_under_oom) +{ + return false; +} + +static inline void mem_cgroup_set_charge_target(struct mem_cgroup **target, + struct mem_cgroup *memcg) +{ +} + +static inline int mem_cgroup_get_name_from_sb(struct super_block *sb, char *buf, + size_t len) +{ + if (len < 1) + return -EINVAL; + buf[0] = '\0'; + return 0; +} + static inline int mem_cgroup_scan_tasks(struct mem_cgroup *memcg, int (*fn)(struct task_struct *, void *), void *arg) { diff --git a/mm/filemap.c b/mm/filemap.c index 6844c9816a864..75e81dfd2c580 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -903,7 +903,7 @@ noinline int __filemap_add_folio(struct address_space *mapping, folio->index = index; if (!huge) { - error = mem_cgroup_charge(folio, NULL, gfp); + error = __mem_cgroup_charge(folio, NULL, gfp, mapping); VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio); if (error) goto error; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 781605e920153..389d2f2be9674 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2580,6 +2580,103 @@ void mem_cgroup_handle_over_high(void) css_put(&memcg->css); } +/* + * Non error return value must eventually be released with css_put(). + */ +struct mem_cgroup *mem_cgroup_get_from_path(const char *path) +{ + struct file *file; + struct cgroup_subsys_state *css; + struct mem_cgroup *memcg; + + file = filp_open(path, O_DIRECTORY | O_RDONLY, 0); + if (IS_ERR(file)) + return (struct mem_cgroup *)file; + + css = css_tryget_online_from_dir(file->f_path.dentry, + &memory_cgrp_subsys); + if (IS_ERR(css)) + memcg = (struct mem_cgroup *)css; + else + memcg = container_of(css, struct mem_cgroup, css); + + fput(file); + return memcg; +} + +/* + * Get the name of the optional charge target memcg associated with @sb. This + * is the cgroup name, not the cgroup path. + */ +int mem_cgroup_get_name_from_sb(struct super_block *sb, char *buf, size_t len) +{ + struct mem_cgroup *memcg; + int ret = 0; + + buf[0] = '\0'; + + rcu_read_lock(); + memcg = rcu_dereference(sb->s_memcg_to_charge); + if (memcg && !css_tryget_online(&memcg->css)) + memcg = NULL; + rcu_read_unlock(); + + if (!memcg) + return 0; + + ret = cgroup_path(memcg->css.cgroup, buf + len / 2, len / 2); + if (ret >= len / 2) + strcpy(buf, "?"); + else { + char *p = mangle_path(buf, buf + len / 2, " \t\n\\"); + + if (p) + *p = '\0'; + else + strcpy(buf, "?"); + } + + css_put(&memcg->css); + return ret < 0 ? ret : 0; +} + +/* + * Set or clear (if @memcg is NULL) charge association from file system to + * memcg. If @memcg != NULL, then a css reference must be held by the caller to + * ensure that the cgroup is not deleted during this operation. + */ +void mem_cgroup_set_charge_target(struct mem_cgroup **target, + struct mem_cgroup *memcg) +{ + if (memcg) + css_get(&memcg->css); + memcg = xchg(target, memcg); + if (memcg) + css_put(&memcg->css); +} + +/* + * Returns the memcg to charge for inode pages. If non-NULL is returned, caller + * must drop reference with css_put(). NULL indicates that the inode does not + * have a memcg to charge, so the default process based policy should be used. + */ +static struct mem_cgroup * +mem_cgroup_mapping_get_charge_target(struct address_space *mapping) +{ + struct mem_cgroup *memcg; + + if (!mapping) + return NULL; + + rcu_read_lock(); + memcg = rcu_dereference(mapping->host->i_sb->s_memcg_to_charge); + if (memcg && !css_tryget_online(&memcg->css)) + memcg = NULL; + rcu_read_unlock(); + + return memcg; +} + static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages) { @@ -6678,12 +6775,15 @@ static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg, return ret; } -int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp) +int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp, + struct address_space *mapping) { struct mem_cgroup *memcg; int ret; - memcg = get_mem_cgroup_from_mm(mm); + memcg = mem_cgroup_mapping_get_charge_target(mapping); + if (!memcg) + memcg = get_mem_cgroup_from_mm(mm); ret = charge_memcg(folio, memcg, gfp); css_put(&memcg->css); diff --git a/mm/shmem.c b/mm/shmem.c index 23c91a8beb781..01510fa8ab725 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -115,10 +115,14 @@ struct shmem_options { bool full_inums; int huge; int seen; +#if CONFIG_MEMCG + struct mem_cgroup *memcg; +#endif #define SHMEM_SEEN_BLOCKS 1 #define SHMEM_SEEN_INODES 2 #define SHMEM_SEEN_HUGE 4 #define SHMEM_SEEN_INUMS 8 +#define SHMEM_SEEN_MEMCG 16 }; #ifdef CONFIG_TMPFS @@ -709,7 +713,8 @@ static int shmem_add_to_page_cache(struct page *page, page->index = index; if (!PageSwapCache(page)) { - error = mem_cgroup_charge(page_folio(page), charge_mm, gfp); + error = __mem_cgroup_charge(page_folio(page), charge_mm, gfp, + mapping); if (error) { if (PageTransHuge(page)) { count_vm_event(THP_FILE_FALLBACK); @@ -3342,6 +3347,7 @@ static const struct export_operations shmem_export_ops = { enum shmem_param { Opt_gid, Opt_huge, + Opt_memcg, Opt_mode, Opt_mpol, Opt_nr_blocks, @@ -3363,6 +3369,7 @@ static const struct constant_table shmem_param_enums_huge[] = { const struct fs_parameter_spec shmem_fs_parameters[] = { fsparam_u32 ("gid", Opt_gid), fsparam_enum ("huge", Opt_huge, shmem_param_enums_huge), + fsparam_string("memcg", Opt_memcg), fsparam_u32oct("mode", Opt_mode), fsparam_string("mpol", Opt_mpol), fsparam_string("nr_blocks", Opt_nr_blocks), @@ -3379,6 +3386,7 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param) struct shmem_options *ctx = fc->fs_private; struct fs_parse_result result; unsigned long long size; + struct mem_cgroup *memcg; char *rest; int opt; @@ -3412,6 +3420,17 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param) goto bad_value; ctx->seen |= SHMEM_SEEN_INODES; break; +#ifdef CONFIG_MEMCG + case Opt_memcg: + if (ctx->memcg) + css_put(&ctx->memcg->css); + memcg = mem_cgroup_get_from_path(param->string); + if (IS_ERR(memcg)) + goto bad_value; + ctx->memcg = memcg; + ctx->seen |= SHMEM_SEEN_MEMCG; + break; +#endif case Opt_mode: ctx->mode = result.uint_32 & 07777; break; @@ -3573,6 +3592,14 @@ static int shmem_reconfigure(struct fs_context *fc) } raw_spin_unlock(&sbinfo->stat_lock); mpol_put(mpol); +#if CONFIG_MEMCG + if (ctx->seen & SHMEM_SEEN_MEMCG && ctx->memcg) { + mem_cgroup_set_charge_target(&fc->root->d_sb->s_memcg_to_charge, + ctx->memcg); + css_put(&ctx->memcg->css); + ctx->memcg = NULL; + } +#endif return 0; out: raw_spin_unlock(&sbinfo->stat_lock); @@ -3582,6 +3609,11 @@ static int shmem_reconfigure(struct fs_context *fc) static int shmem_show_options(struct seq_file *seq, struct dentry *root) { struct shmem_sb_info *sbinfo = SHMEM_SB(root->d_sb); + int err; + char *buf = __getname(); + + if (!buf) + return -ENOMEM; if (sbinfo->max_blocks != shmem_default_max_blocks()) seq_printf(seq, ",size=%luk", @@ -3625,7 +3657,13 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root) seq_printf(seq, ",huge=%s", shmem_format_huge(sbinfo->huge)); #endif shmem_show_mpol(seq, sbinfo->mpol); - return 0; + /* Memory cgroup binding: memcg=cgroup_name */ + err = mem_cgroup_get_name_from_sb(root->d_sb, buf, PATH_MAX); + if (!err && buf[0] != '\0') + seq_printf(seq, ",memcg=%s", buf); + + __putname(buf); + return err; } #endif /* CONFIG_TMPFS */ @@ -3710,6 +3748,14 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc) sb->s_flags |= SB_POSIXACL; #endif uuid_gen(&sb->s_uuid); +#if CONFIG_MEMCG + if (ctx->memcg) { + mem_cgroup_set_charge_target(&sb->s_memcg_to_charge, + ctx->memcg); + css_put(&ctx->memcg->css); + ctx->memcg = NULL; + } +#endif inode = shmem_get_inode(sb, NULL, S_IFDIR | sbinfo->mode, 0, VM_NORESERVE); if (!inode) From patchwork Mon Nov 8 21:19:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 12609109 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F081FC433EF for ; Mon, 8 Nov 2021 21:20:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 821DC61361 for ; Mon, 8 Nov 2021 21:20:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 821DC61361 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 224E46B0072; Mon, 8 Nov 2021 16:20:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D4646B0073; Mon, 8 Nov 2021 16:20:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C5586B0074; Mon, 8 Nov 2021 16:20:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0082.hostedemail.com [216.40.44.82]) by kanga.kvack.org (Postfix) with ESMTP id F3AFD6B0072 for ; Mon, 8 Nov 2021 16:20:10 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id AF4471813708F for ; Mon, 8 Nov 2021 21:20:10 +0000 (UTC) X-FDA: 78787030980.15.7C61415 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf02.hostedemail.com (Postfix) with ESMTP id 7C39B7001713 for ; Mon, 8 Nov 2021 21:20:04 +0000 (UTC) Received: by mail-pf1-f201.google.com with SMTP id z19-20020aa79593000000b0049472f5e52dso7655517pfj.13 for ; Mon, 08 Nov 2021 13:20:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:cc; bh=ZMz9mAglfr0087cpLEYUjV++mI1ePPikWKx3K/Mi3/g=; b=eQWejA7DFPbb3XHQdQsO1O81F5j5nVRtuzAWVptvU8ipbvidXvGHeRvBEGV3TQkuZF N3LeJhdmJgoOx/k2uvb5EIouPRu3C7cFY65zSv0oTcZDXNgPzd9eSIlizIPrInInwDtL 5wXMuiLX4K8Rno4G9cb4cN98D3aUBpOCjdLKD7ujQGYDACFdBqyO7WYQ8GMQND9QCR1b PT/05SKYcgXAKhwmuBt6ectzJBGyMkfPIcH5RzlDAwmZhGPxX78FC3ZWaoJFPUNPmRdk oYfNA+dmEwc22B5Pd6dhT5v9UApeCz8JPwyof+w2wq84JtHlW6pTHqan783jaEiRQAA9 bi0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:cc; bh=ZMz9mAglfr0087cpLEYUjV++mI1ePPikWKx3K/Mi3/g=; b=BUt2oha179XzI/wjY2hs/FQGq+87RvadSJHfeZNGv4LSfluSgyaGtH/g18ZK2NzmP5 MnUdQ9L9zUoCb4wog/dCdtlrje9MxUoMgoa6uOXs0satfi9Hru9DpeOXky82Thpmqxkt ZK7c0HJQ3+OIqnrA/AhbXZc5acEQ3+L/9hxlDP9tLtGzEoeYFQG4uHatqPVUf220sB2F tB3ltgDH36DOjewk4EXoFZDxrZwN0nYMxg3OJT74FfJU0kfY/sB9DkGyg6cPRmGBjz4R LcoQccqGa7083YWeB36PiOA0c4oG3bqWfpO4EJS/Z02Ul14UEhek4/8/XaiObL6umxb0 3yRw== X-Gm-Message-State: AOAM532q5iCJs7CEEv79Jydc7lyF5+PXvO5iYli9VBVUlmopDK51NvQh 4kRw1Zbe7VKEKtye+V87XjxwX6bgweEyPA11UQ== X-Google-Smtp-Source: ABdhPJyqUdcqHHS5ZEYdqfLkpdwzUGFGnjlm/Fxt/QXzAUvg/F+LJxhmAcpzJ6AmbPdNMFtj9kgXoLfmK/WLh9sHaQ== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2cd:202:8717:7707:fb59:664e]) (user=almasrymina job=sendgmr) by 2002:a17:902:bb96:b0:13f:b181:58ef with SMTP id m22-20020a170902bb9600b0013fb18158efmr2368937pls.2.1636406409232; Mon, 08 Nov 2021 13:20:09 -0800 (PST) Date: Mon, 8 Nov 2021 13:19:56 -0800 In-Reply-To: <20211108211959.1750915-1-almasrymina@google.com> Message-Id: <20211108211959.1750915-3-almasrymina@google.com> Mime-Version: 1.0 References: <20211108211959.1750915-1-almasrymina@google.com> X-Mailer: git-send-email 2.34.0.rc0.344.g81b53c2807-goog Subject: [PATCH v1 2/5] mm: add tmpfs memcg= permissions check From: Mina Almasry Cc: Mina Almasry , Michal Hocko , "Theodore Ts'o" , Greg Thelen , Shakeel Butt , Andrew Morton , Hugh Dickins , Roman Gushchin , Johannes Weiner , Tejun Heo , Vladimir Davydov , riel@surriel.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 7C39B7001713 X-Stat-Signature: dc85eknzjkbu1cixmcezz1etiydki5te Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eQWejA7D; spf=pass (imf02.hostedemail.com: domain of 3iZSJYQsKCOoMXYMedkYUZMSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--almasrymina.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3iZSJYQsKCOoMXYMedkYUZMSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1636406404-638018 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Restricts the mounting of tmpfs: mount -t tmpfs -o memcg= Only if the mounting task is allowed to open /cgroup.procs file and allowed to enter the cgroup. Thus, processes are allowed to direct tmpfs changes to a cgroup that they themselves can enter and allocate memory in. Signed-off-by: Mina Almasry Cc: Michal Hocko Cc: Theodore Ts'o Cc: Greg Thelen Cc: Shakeel Butt Cc: Andrew Morton Cc: Hugh Dickins Cc: Roman Gushchin Cc: Johannes Weiner Cc: Hugh Dickins Cc: Tejun Heo Cc: Vladimir Davydov Cc: Muchun Song Cc: riel@surriel.com Cc: linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org Cc: cgroups@vger.kernel.org --- mm/memcontrol.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) -- 2.34.0.rc0.344.g81b53c2807-goog diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 389d2f2be9674..2e4c20d09f959 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -62,6 +62,7 @@ #include #include #include +#include #include "internal.h" #include #include @@ -2585,9 +2586,32 @@ void mem_cgroup_handle_over_high(void) */ struct mem_cgroup *mem_cgroup_get_from_path(const char *path) { - struct file *file; + static const char procs_filename[] = "/cgroup.procs"; + struct file *file, *procs; struct cgroup_subsys_state *css; struct mem_cgroup *memcg; + char *procs_path = + kmalloc(strlen(path) + sizeof(procs_filename), GFP_KERNEL); + + if (procs_path == NULL) + return ERR_PTR(-ENOMEM); + strcpy(procs_path, path); + strcat(procs_path, procs_filename); + + procs = filp_open(procs_path, O_WRONLY, 0); + kfree(procs_path); + + /* + * Restrict the capability for tasks to mount with memcg charging to the + * cgroup they could not join. For example, disallow: + * + * mount -t tmpfs -o memcg=root-cgroup nodev + * + * if it is a non-root task. + */ + if (IS_ERR(procs)) + return (struct mem_cgroup *)procs; + fput(procs); file = filp_open(path, O_DIRECTORY | O_RDONLY, 0); if (IS_ERR(file)) From patchwork Mon Nov 8 21:19:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 12609111 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8698AC4167B for ; Mon, 8 Nov 2021 21:20:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1F153614C8 for ; Mon, 8 Nov 2021 21:20:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 1F153614C8 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id B74246B0073; Mon, 8 Nov 2021 16:20:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B22E66B0074; Mon, 8 Nov 2021 16:20:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A13036B0075; Mon, 8 Nov 2021 16:20:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0009.hostedemail.com [216.40.44.9]) by kanga.kvack.org (Postfix) with ESMTP id 910AD6B0073 for ; Mon, 8 Nov 2021 16:20:13 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 553597CA53 for ; Mon, 8 Nov 2021 21:20:13 +0000 (UTC) X-FDA: 78787031106.16.2386E94 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf13.hostedemail.com (Postfix) with ESMTP id 91387104F7B8 for ; Mon, 8 Nov 2021 21:20:02 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id m17-20020a170902db1100b001421cb34857so6903806plx.15 for ; Mon, 08 Nov 2021 13:20:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:cc; bh=HuOsUesI41VMxHbd9WL9Fs5X3h8wZbAlymoJuWa+M0o=; b=mgCbALj0RzI1y7dYKpXbyaiAMLfsKBudFfa7Ub2nVulXL0OgGprL2nvZQ5jgnYWFQk FyRwZRuXkDXDZQg/EmG5RPSKJ0THWJP0Gn8tavb7JhK0Lx6F7Dj9+4l1egra+m7YtrPy lpmGx4XlaWZ9qtVHp2GizypVKABYErQ15IN9nQhWjo3/L/uwJ8MJb1L7MoiMNoVeyH0V iwZn7AfMAJUXEPjwLeaNB9ZvVpm0reoL4Ua4Yl6E+SYJvPNIlcuk4x63koSryTbpjh2C Dfoc/LjYIFQeWJUEHu9QP2jJta84c10T9HT3Jpk5hQhWEHbsuURruitDfys71o4MWJGi BxHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:cc; bh=HuOsUesI41VMxHbd9WL9Fs5X3h8wZbAlymoJuWa+M0o=; b=GSunfRrTPv7wc4TTXN7Z0kxRQ9+q5qHKgDP0NEVLqJB3M4Y9lJdg0rp10yVEcutUGL yN6hj0ntfsk4krntwXK6crDlYiMzVPppmmNT8tvTv8UClLLmydPhlEu2RV1aqy0q7gKl rE72iW2asN363pT91/FCb/oKNfbnMd45w1RtfjEw+rzvPHrXjV3ok5n6Qzjaqnq7YA/A ngtJ7IOiiBAZuRmAi3Hs1cqTfpkmZPz9ywcZ2wxhCj/uuYC7CIUqEizSAz9jjQfxxgBr vm5f2S7yPNVU3od0BG6gzrqmdl8FdmqWR4ti32X20htrjWu9J7LgjSpUUjp0BIOYfnQw TvyA== X-Gm-Message-State: AOAM532b9nVN1WLb7AD5rTcwNDr+HNb6q4BtwSSibqH+yDtZ2dBqAK9P wKXHPYQQhQypiBTt+iocVqdKo7Ndr41HsoH8Dg== X-Google-Smtp-Source: ABdhPJxtKMCyalpwo6Rk4XPSadnMPMqQz4M27ofn10pg+q/fy4JS+sm19K/CP1QUZKWIjfQ6AuvOtu9HpgCeOuOOeQ== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2cd:202:8717:7707:fb59:664e]) (user=almasrymina job=sendgmr) by 2002:aa7:88cb:0:b0:49f:ad17:c08 with SMTP id k11-20020aa788cb000000b0049fad170c08mr2017245pff.19.1636406411895; Mon, 08 Nov 2021 13:20:11 -0800 (PST) Date: Mon, 8 Nov 2021 13:19:57 -0800 In-Reply-To: <20211108211959.1750915-1-almasrymina@google.com> Message-Id: <20211108211959.1750915-4-almasrymina@google.com> Mime-Version: 1.0 References: <20211108211959.1750915-1-almasrymina@google.com> X-Mailer: git-send-email 2.34.0.rc0.344.g81b53c2807-goog Subject: [PATCH v1 3/5] mm/oom: handle remote ooms From: Mina Almasry Cc: Mina Almasry , Michal Hocko , "Theodore Ts'o" , Greg Thelen , Shakeel Butt , Andrew Morton , Hugh Dickins , Roman Gushchin , Johannes Weiner , Tejun Heo , Vladimir Davydov , riel@surriel.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 91387104F7B8 X-Stat-Signature: iubrmj5iym7tf489eubb6bqhsk5i9rnt Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=mgCbALj0; spf=pass (imf13.hostedemail.com: domain of 3i5SJYQsKCOwOZaOgfmaWbOUccUZS.QcaZWbil-aaYjOQY.cfU@flex--almasrymina.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3i5SJYQsKCOwOZaOgfmaWbOUccUZS.QcaZWbil-aaYjOQY.cfU@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1636406402-897352 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On remote ooms (OOMs due to remote charging), the oom-killer will attempt to find a task to kill in the memcg under oom, if the oom-killer is unable to find one, the oom-killer should simply return ENOMEM to the allocating process. If we're in pagefault path and we're unable to return ENOMEM to the allocating process, we instead kill the allocating process. Signed-off-by: Mina Almasry Cc: Michal Hocko Cc: Theodore Ts'o Cc: Greg Thelen Cc: Shakeel Butt Cc: Andrew Morton Cc: Hugh Dickins Cc: Roman Gushchin Cc: Johannes Weiner Cc: Hugh Dickins Cc: Tejun Heo Cc: Vladimir Davydov Cc: Muchun Song Cc: riel@surriel.com Cc: linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org Cc: cgroups@vger.kernel.org --- mm/memcontrol.c | 21 +++++++++++++++++++++ mm/oom_kill.c | 21 +++++++++++++++++++++ 2 files changed, 42 insertions(+) -- 2.34.0.rc0.344.g81b53c2807-goog diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2e4c20d09f959..fc9c6280266b6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2664,6 +2664,27 @@ int mem_cgroup_get_name_from_sb(struct super_block *sb, char *buf, size_t len) return ret < 0 ? ret : 0; } +/* + * Returns true if current's mm is a descendant of the memcg_under_oom (or + * equal to it). False otherwise. This is used by the oom-killer to detect + * ooms due to remote charging. + */ +bool is_remote_oom(struct mem_cgroup *memcg_under_oom) +{ + struct mem_cgroup *current_memcg; + bool is_remote_oom; + + if (!memcg_under_oom) + return false; + + current_memcg = get_mem_cgroup_from_mm(current->mm); + is_remote_oom = + !mem_cgroup_is_descendant(current_memcg, memcg_under_oom); + css_put(¤t_memcg->css); + + return is_remote_oom; +} + /* * Set or clear (if @memcg is NULL) charge association from file system to * memcg. If @memcg != NULL, then a css reference must be held by the caller to diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0a7e16b16b8c3..556329dee273f 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1106,6 +1106,27 @@ bool out_of_memory(struct oom_control *oc) } select_bad_process(oc); + + /* + * For remote ooms in userfaults, we have no choice but to kill the + * allocating process. + */ + if (!oc->chosen && is_remote_oom(oc->memcg) && current->in_user_fault && + !oom_unkillable_task(current)) { + get_task_struct(current); + oc->chosen = current; + oom_kill_process( + oc, "Out of memory (Killing remote allocating task)"); + return true; + } + + /* + * For remote ooms in non-userfaults, simply return ENOMEM to the + * caller. + */ + if (!oc->chosen && is_remote_oom(oc->memcg)) + return false; + /* Found nothing?!?! */ if (!oc->chosen) { dump_header(oc, NULL); From patchwork Mon Nov 8 21:19:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 12609113 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3BCBC433EF for ; Mon, 8 Nov 2021 21:20:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 771C461052 for ; Mon, 8 Nov 2021 21:20:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 771C461052 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 1FEB26B0074; Mon, 8 Nov 2021 16:20:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B0C66B0075; Mon, 8 Nov 2021 16:20:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 028D96B0078; Mon, 8 Nov 2021 16:20:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0210.hostedemail.com [216.40.44.210]) by kanga.kvack.org (Postfix) with ESMTP id E9C646B0074 for ; Mon, 8 Nov 2021 16:20:15 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A82DE7CA4F for ; Mon, 8 Nov 2021 21:20:15 +0000 (UTC) X-FDA: 78787031190.21.749103D Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf30.hostedemail.com (Postfix) with ESMTP id C34F5E0019A4 for ; Mon, 8 Nov 2021 21:19:55 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id x20-20020aa79414000000b0049f9cb19a5fso6530215pfo.17 for ; Mon, 08 Nov 2021 13:20:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:cc; bh=9WU77qiyFJ2kLEsta1yGwcYoFr6WpwZMlTN2QvzNdcQ=; b=XoZ7JWwmBewgPmGjnm10qp+RIw/NVsOqlra/8f6lrdUWEv3z88BW5xl/BmGAZkkpVX z4V26/l0vtZObekIpNQ9FfYR1y+YMbCt1PFXx5zNa2pn3Yras1kBCHo2liM5OI4tMNeS DTeMxphjd6Da/3anBbATiQQCseQmZwrNVBDA7y3+umUoYpuLIvfwNqTR+NH5tPb2JkCw MA5Xi0MH83tKnMBDAkMTLh1Ly8kWbBsL0dWZ5d4jGWmF27bPoDNZHLRR9nffqgmAflmV 79e58VzLJDhg/VKpYqz92xR5sXSmywy2IKW+8FrUBL0lHHTPO3pM0UH/67MUSsX4G/Xa Y1QA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:cc; bh=9WU77qiyFJ2kLEsta1yGwcYoFr6WpwZMlTN2QvzNdcQ=; b=fmj0qOnstyAZ2V/+HzWnfMepMku1uOTMI86L6m/93XT/ttX+sxaO95o39Kn4GGTwlP 5eyafoEuJPX4SbW3zYBF9yRHnc++4NKDAokSwPGrL/xi3QJXNooY7vdATN9uXD7lOqCR HWlpn2QbcMGTn7hTXJLCxtQkyXNYIwfRDEvvyjkoCTtOf0CBAX37ucD4jLbi1V+qD1Hf hFbG5l5aD0KKctqfZR05GDRYD9K/WrJBiZGt5TDibocCAwbdVUQK19Sn64bWn4G44XlN Zb6x2McLGWOg6qlCFJpUoFv0T244pWOPdAVZHY2jKCso/y/hsvTnCZ81oYGAJkD9fPFG 8leg== X-Gm-Message-State: AOAM5302pDKFYOQsD+JdLF7+6XaoYFsjHRqhrj0Q5o0ldQ5WZdZbTarJ jJ992x+OrxMCDZxOkkeuRvLdvo9hPCAusKpopw== X-Google-Smtp-Source: ABdhPJyiMnEsbTg2euGguvqj93KbzV4NSrI0GvzmkEHj4Gpa1PgsADOPZu4SL6FQIysaNjHUxKEEy6x0bm2X74pFeg== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2cd:202:8717:7707:fb59:664e]) (user=almasrymina job=sendgmr) by 2002:a63:82c6:: with SMTP id w189mr1873879pgd.469.1636406414317; Mon, 08 Nov 2021 13:20:14 -0800 (PST) Date: Mon, 8 Nov 2021 13:19:58 -0800 In-Reply-To: <20211108211959.1750915-1-almasrymina@google.com> Message-Id: <20211108211959.1750915-5-almasrymina@google.com> Mime-Version: 1.0 References: <20211108211959.1750915-1-almasrymina@google.com> X-Mailer: git-send-email 2.34.0.rc0.344.g81b53c2807-goog Subject: [PATCH v1 4/5] mm, shmem: add tmpfs memcg= option documentation From: Mina Almasry Cc: Mina Almasry , Michal Hocko , "Theodore Ts'o" , Greg Thelen , Shakeel Butt , Andrew Morton , Hugh Dickins , Roman Gushchin , Johannes Weiner , Tejun Heo , Vladimir Davydov , riel@surriel.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: C34F5E0019A4 X-Stat-Signature: taq6jubiymd9ywtfpjbaj4yguuqairii Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=XoZ7JWwm; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 3jpSJYQsKCO8RcdRjipdZeRXffXcV.TfdcZelo-ddbmRTb.fiX@flex--almasrymina.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3jpSJYQsKCO8RcdRjipdZeRXffXcV.TfdcZelo-ddbmRTb.fiX@flex--almasrymina.bounces.google.com X-HE-Tag: 1636406395-725055 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Signed-off-by: Mina Almasry Cc: Michal Hocko Cc: Theodore Ts'o Cc: Greg Thelen Cc: Shakeel Butt Cc: Andrew Morton Cc: Hugh Dickins Cc: Roman Gushchin Cc: Johannes Weiner Cc: Hugh Dickins Cc: Tejun Heo Cc: Vladimir Davydov Cc: Muchun Song Cc: riel@surriel.com Cc: linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org Cc: cgroups@vger.kernel.org --- Documentation/filesystems/tmpfs.rst | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) -- 2.34.0.rc0.344.g81b53c2807-goog diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst index 0408c245785e3..1ab04e8fa9222 100644 --- a/Documentation/filesystems/tmpfs.rst +++ b/Documentation/filesystems/tmpfs.rst @@ -137,6 +137,23 @@ mount options. It can be added later, when the tmpfs is already mounted on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'. +If CONFIG_MEMCG is enabled, tmpfs has a mount option to specify the memory +cgroup to be charged for page allocations. + +memcg=/sys/fs/cgroup/unified/test/: data page allocations are charged to +cgroup /sys/fs/cgroup/unified/test/. + +When charging memory to the remote memcg (memcg specified with memcg=) and +hitting the limit, the oom-killer will be invoked and will attempt to kill +a process in the remote memcg. If no such processes are found, the remote +charging process gets an ENOMEM. If the remote charging process is in the +pagefault path, it gets killed. + +Only processes that have access to /sys/fs/cgroup/unified/test/cgroup.procs can +mount a tmpfs with memcg=/sys/fs/cgroup/unified/test. Thus, a process is able +to charge memory to a cgroup only if it itself is able to enter that cgroup. + + To specify the initial root directory you can use the following mount options: From patchwork Mon Nov 8 21:19:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 12609115 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41307C433F5 for ; Mon, 8 Nov 2021 21:20:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E6D2261052 for ; Mon, 8 Nov 2021 21:20:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E6D2261052 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 927256B0075; Mon, 8 Nov 2021 16:20:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 88C516B0078; Mon, 8 Nov 2021 16:20:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7553E6B007B; Mon, 8 Nov 2021 16:20:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0143.hostedemail.com [216.40.44.143]) by kanga.kvack.org (Postfix) with ESMTP id 5721B6B0075 for ; Mon, 8 Nov 2021 16:20:18 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 14F1618432CAC for ; Mon, 8 Nov 2021 21:20:18 +0000 (UTC) X-FDA: 78787031316.22.DA2167E Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf13.hostedemail.com (Postfix) with ESMTP id 5F233104F79E for ; Mon, 8 Nov 2021 21:20:07 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id e10-20020a17090301ca00b00141fbe2569dso7138739plh.14 for ; Mon, 08 Nov 2021 13:20:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:cc; bh=LgzEEqLYmw5ziRk+Zpmuo/4f8TX/yoKSNyIDF3U5Tf0=; b=dkS19bQQYgTX6/T8jpDaR8pAVWMYF9eTHs7ItoEw3LmZ6NEboTY5qgaJunrQFd4nxd hmIH43JH05TYBkiNZHmt7mz6SQpIc2T1RebaDAq+1gqUPbjJJwR8AV3q+lwvAj2efYGP Ec7gzZ2pGVc+bS/HMhHLxxkVEiilmFOfyUWuktV2i91mDTDQ+rvUrdoh03H2cpyPTqYA qLcppI1NNSqtON/IG5KuPPu/mL47x7Vh8EGcgUbF902dOTymSW8R3F8xglbi2XWS1Y8N ImS/vJb3SmqXzR0z2tm19uedWyIS8L2xDAAIxDIbtsYANfcHr9OtX5Ab8TBfp7zN+qrN GQTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:cc; bh=LgzEEqLYmw5ziRk+Zpmuo/4f8TX/yoKSNyIDF3U5Tf0=; b=PawpY6pIwsUBEmtbogcJ4tDvSyivKpfWm8vvCA009KXoVygZuRap0CvZ6GVPierSyK Ai3kL7bpyWHCKQQmFBsNjBsxzAVzVT6DyfnuNCEZTRbkke6veSEoJs6rmI6DP2IQVrno Jh5Fu9OE87zLlsS7oDr13hRhWj2p2OsLo0JNaXEWAXl8xRa19OGyq4XE893eKFwkqZEr B0JdASBQ0puAUTP7NMPd74OrK6eMRoh7O1h1aUGiU3SjDPDrrtVlTXjS/iEFtQv4Gdsx M+XIMPZyPdq/Peun2/8nQrwt+lVD2zFVSPROHTUKe2mCJt3qjE1eNYNNMRH/CbT78Pzm eYvQ== X-Gm-Message-State: AOAM5334llNzDGApz5iKOeGIEAZV8jhqu/6nqS2++CH3LbdUfrXkm3eE WzD6Iho5jQALs7reSAUqB0oUKLo00YYfZcfuzg== X-Google-Smtp-Source: ABdhPJz/4wwyS8huhn9kLJ1CmDJcWry/ijIFPMY/kO2XaGL2/Kt4rjB472R+qFOc6hisw6Uhe1o3ER09e9RLY12zdw== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2cd:202:8717:7707:fb59:664e]) (user=almasrymina job=sendgmr) by 2002:a17:902:654d:b0:141:7df3:b94 with SMTP id d13-20020a170902654d00b001417df30b94mr2306024pln.60.1636406416836; Mon, 08 Nov 2021 13:20:16 -0800 (PST) Date: Mon, 8 Nov 2021 13:19:59 -0800 In-Reply-To: <20211108211959.1750915-1-almasrymina@google.com> Message-Id: <20211108211959.1750915-6-almasrymina@google.com> Mime-Version: 1.0 References: <20211108211959.1750915-1-almasrymina@google.com> X-Mailer: git-send-email 2.34.0.rc0.344.g81b53c2807-goog Subject: [PATCH v1 5/5] mm, shmem, selftests: add tmpfs memcg= mount option tests From: Mina Almasry Cc: Mina Almasry , Michal Hocko , "Theodore Ts'o" , Greg Thelen , Shakeel Butt , Andrew Morton , Hugh Dickins , Roman Gushchin , Johannes Weiner , Tejun Heo , Vladimir Davydov , riel@surriel.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 5F233104F79E X-Stat-Signature: zqsp1hdqzxjhfcko6du8io79fb6s9rf6 Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=dkS19bQQ; spf=pass (imf13.hostedemail.com: domain of 3kJSJYQsKCPETefTlkrfbgTZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--almasrymina.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3kJSJYQsKCPETefTlkrfbgTZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1636406407-468365 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Signed-off-by: Mina Almasry Cc: Michal Hocko Cc: Theodore Ts'o Cc: Greg Thelen Cc: Shakeel Butt Cc: Andrew Morton Cc: Hugh Dickins Cc: Roman Gushchin Cc: Johannes Weiner Cc: Hugh Dickins Cc: Tejun Heo Cc: Vladimir Davydov Cc: Muchun Song Cc: riel@surriel.com Cc: linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org Cc: cgroups@vger.kernel.org --- tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/mmap_write.c | 105 ++++++++++++++++++++++ tools/testing/selftests/vm/tmpfs-memcg.sh | 70 +++++++++++++++ 3 files changed, 176 insertions(+) create mode 100644 tools/testing/selftests/vm/mmap_write.c create mode 100755 tools/testing/selftests/vm/tmpfs-memcg.sh -- 2.34.0.rc0.344.g81b53c2807-goog diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index 2e7e86e852828..cb229974c5f15 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -19,6 +19,7 @@ madv_populate userfaultfd mlock-intersect-test mlock-random-test +mmap_write virtual_address_range gup_test va_128TBswitch diff --git a/tools/testing/selftests/vm/mmap_write.c b/tools/testing/selftests/vm/mmap_write.c new file mode 100644 index 0000000000000..3afeaccea9f44 --- /dev/null +++ b/tools/testing/selftests/vm/mmap_write.c @@ -0,0 +1,105 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * This program faults memory in tmpfs + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* Global definitions. */ + +/* Global variables. */ +static const char *self; +static char *shmaddr; +static int shmid; + +/* + * Show usage and exit. + */ +static void exit_usage(void) +{ + printf("Usage: %s -p -s \n", self); + exit(EXIT_FAILURE); +} + +int main(int argc, char **argv) +{ + int fd = 0; + int key = 0; + int *ptr = NULL; + int c = 0; + int size = 0; + char path[256] = ""; + int want_sleep = 0, private = 0; + int populate = 0; + int write = 0; + int reserve = 1; + + /* Parse command-line arguments. */ + setvbuf(stdout, NULL, _IONBF, 0); + self = argv[0]; + + while ((c = getopt(argc, argv, ":s:p:")) != -1) { + switch (c) { + case 's': + size = atoi(optarg); + break; + case 'p': + strncpy(path, optarg, sizeof(path)); + break; + default: + errno = EINVAL; + perror("Invalid arg"); + exit_usage(); + } + } + + printf("%s\n", path); + if (strncmp(path, "", sizeof(path)) != 0) { + printf("Writing to this path: %s\n", path); + } else { + errno = EINVAL; + perror("path not found"); + exit_usage(); + } + + if (size != 0) { + printf("Writing this size: %d\n", size); + } else { + errno = EINVAL; + perror("size not found"); + exit_usage(); + } + + printf("Not writing to memory.\n"); + printf("Allocating using HUGETLBFS.\n"); + fd = open(path, O_CREAT | O_RDWR, 0777); + if (fd == -1) + err(1, "Failed to open file."); + + if (ftruncate(fd, size)) + err(1, "failed to ftruncate %s", path); + + ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (ptr == MAP_FAILED) { + close(fd); + err(1, "Error mapping the file"); + } + + printf("Writing to memory.\n"); + memset(ptr, 1, size); + printf("Done writing to memory.\n"); + close(fd); + + return 0; +} diff --git a/tools/testing/selftests/vm/tmpfs-memcg.sh b/tools/testing/selftests/vm/tmpfs-memcg.sh new file mode 100755 index 0000000000000..fe7ffe769f903 --- /dev/null +++ b/tools/testing/selftests/vm/tmpfs-memcg.sh @@ -0,0 +1,70 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +CGROUP_PATH=/dev/cgroup/memory/tmpfs-memcg-test + +function cleanup() { + rm -rf /mnt/tmpfs/* + umount /mnt/tmpfs + rm -rf /mnt/tmpfs + + rmdir $CGROUP_PATH + + echo CLEANUP DONE +} + +function setup() { + mkdir -p $CGROUP_PATH + echo $((10 * 1024 * 1024)) > $CGROUP_PATH/memory.limit_in_bytes + echo 0 > $CGROUP_PATH/cpuset.cpus + echo 0 > $CGROUP_PATH/cpuset.mems + + mkdir -p /mnt/tmpfs + + echo SETUP DONE +} + +function expect_equal() { + local expected="$1" + local actual="$2" + local error="$3" + + if [[ "$expected" != "$actual" ]]; then + echo "expected ($expected) != actual ($actual): $3" >&2 + cleanup + exit 1 + fi +} + +cleanup +setup + +mount -t tmpfs -o memcg=$CGROUP_PATH tmpfs /mnt/tmpfs + +TARGET_MEMCG_USAGE=$(cat $CGROUP_PATH/memory.usage_in_bytes) +expect_equal 0 "$TARGET_MEMCG_USAGE" "Before echo, memcg usage should be 0" + +# Echo to allocate a page in the tmpfs +echo hello > /mnt/tmpfs/test +TARGET_MEMCG_USAGE=$(cat $CGROUP_PATH/memory.usage_in_bytes) +expect_equal 131072 "$TARGET_MEMCG_USAGE" "After echo, memcg usage should be 131072" +echo "Echo test succeeded" + +# OOM the remote container on pagefault. +echo +echo +echo "OOMing the remote container using pagefault." +echo "This will take a long time because the kernel goes through reclaim retries," +echo "but should eventually be OOM-killed by 'Out of memory (Killing remote allocating task)'" +tools/testing/selftests/vm/mmap_write -p /mnt/tmpfs/test -s $((11 * 1024 * 1024)) + +# OOM the remote container on non pagefault. +echo +echo +echo "OOMing the remote container using cat (non-pagefault)" +echo "This will take a long time because the kernel goes through reclaim retries," +echo "but should eventually the cat command should receive an ENOMEM" +cat /dev/random > /mnt/tmpfs/random + +cleanup +echo SUCCESS