From patchwork Fri Oct 18 00:28:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197375 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0347415AB for ; Fri, 18 Oct 2019 00:28:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B6DD221D80 for ; Fri, 18 Oct 2019 00:28:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="BzfCiED/" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B6DD221D80 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CDA888E0006; Thu, 17 Oct 2019 20:28:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C63408E0003; Thu, 17 Oct 2019 20:28:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B34308E0006; Thu, 17 Oct 2019 20:28:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0047.hostedemail.com [216.40.44.47]) by kanga.kvack.org (Postfix) with ESMTP id 8A88E8E0003 for ; Thu, 17 Oct 2019 20:28:38 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 0F830801F609 for ; Fri, 18 Oct 2019 00:28:38 +0000 (UTC) X-FDA: 76055019516.30.money92_4b4f92d232e0c X-Spam-Summary: 2,0,0,851463cb358bbae5,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com,RULES_HIT:1:2:41:355:379:541:800:960:966:973:988:989:1260:1261:1277:1313:1314:1345:1359:1437:1516:1518:1605:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2559:2562:2693:2731:2741:2918:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:4051:4250:4321:4385:4605:5007:6119:6261:6653:7903:8784:9010:10004:11026:11473:11658:11914:12043:12291:12296:12297:12438:12555:12683:12895:12986:13141:13161:13229:13230:14096:14097:14394:21080:21433:21450:21451:21627:30034:30054:30064:30080,0,RBL:67.231.153.30:@fb.com:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: money92_4b4f92d232e0c X-Filterd-Recvd-Size: 10917 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 00:28:37 +0000 (UTC) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0Nd6h008603 for ; Thu, 17 Oct 2019 17:28:36 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=6Ev+lnUz8ysIJ2nTszRvb3nHp4nlZwcg8+MVkeEzkbM=; b=BzfCiED/BTDWiypVOyIO3EvbRtDWSojBXM1GJoUYwrUx5ZJCHyGXbLSR9/8fzBlRPDdh mE8am/BkOGAAG1WpXo0+UjrozDcrRaIN/+tCrqkgdfAOnvIgRruwZcrdhTCSBRByqq8c LC8FvzyQHp3YyANwP5AcXWU0ZwE3ICRNMII= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2vpw9r9fab-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2019 17:28:36 -0700 Received: from 2401:db00:30:600c:face:0:1f:0 (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 17 Oct 2019 17:28:35 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id E37CA18CE8475; Thu, 17 Oct 2019 17:28:33 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 01/16] mm: memcg: introduce mem_cgroup_ptr Date: Thu, 17 Oct 2019 17:28:05 -0700 Message-ID: <20191018002820.307763-2-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxlogscore=999 lowpriorityscore=0 mlxscore=0 suspectscore=3 phishscore=0 clxscore=1015 adultscore=0 spamscore=0 bulkscore=0 priorityscore=1501 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180001 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This commit introduces mem_cgroup_ptr structure and corresponding API. It implements a pointer to a memory cgroup with a built-in reference counter. The main goal of it is to implement reparenting efficiently. If a number of objects (e.g. slab pages) have to keep a pointer and a reference to a memory cgroup, they can use mem_cgroup_ptr instead. On reparenting, only one mem_cgroup_ptr->memcg pointer has to be changed, instead of walking over all accounted objects. mem_cgroup_ptr holds a single reference to the corresponding memory cgroup. Because it's initialized before the css reference counter, css's refcounter can't be bumped at allocation time. Instead, it's bumped on reparenting which happens during offlining. A cgroup is never released online, so it's fine. mem_cgroup_ptr is released using rcu, so memcg->kmem_memcg_ptr can be accessed in a rcu read section. On reparenting it's atomically switched to NULL. If the reader gets NULL, it can just read parent's kmem_memcg_ptr instead. Each memory cgroup contains a list of kmem_memcg_ptrs. On reparenting the list is spliced into the parent's list. The list is protected using the css set lock. Signed-off-by: Roman Gushchin --- include/linux/memcontrol.h | 50 ++++++++++++++++++++++ mm/memcontrol.c | 87 ++++++++++++++++++++++++++++++++++++-- 2 files changed, 133 insertions(+), 4 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 99033f9bd319..da864fded297 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -23,6 +23,7 @@ #include struct mem_cgroup; +struct mem_cgroup_ptr; struct page; struct mm_struct; struct kmem_cache; @@ -198,6 +199,22 @@ struct memcg_cgwb_frn { struct wb_completion done; /* tracks in-flight foreign writebacks */ }; +/* + * A pointer to a memory cgroup with a built-in reference counter. + * For a use as an intermediate object to simplify reparenting of + * objects charged to the cgroup. The memcg pointer can be switched + * to the parent cgroup without a need to modify all objects + * which hold the reference to the cgroup. + */ +struct mem_cgroup_ptr { + struct percpu_ref refcnt; + struct mem_cgroup *memcg; + union { + struct list_head list; + struct rcu_head rcu; + }; +}; + /* * The memory controller data structure. The memory controller controls both * page cache and RSS per cgroup. We would eventually like to provide @@ -311,6 +328,8 @@ struct mem_cgroup { int kmemcg_id; enum memcg_kmem_state kmem_state; struct list_head kmem_caches; + struct mem_cgroup_ptr __rcu *kmem_memcg_ptr; + struct list_head kmem_memcg_ptr_list; #endif int last_scanned_node; @@ -439,6 +458,21 @@ struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ return css ? container_of(css, struct mem_cgroup, css) : NULL; } +static inline bool mem_cgroup_ptr_tryget(struct mem_cgroup_ptr *ptr) +{ + return percpu_ref_tryget(&ptr->refcnt); +} + +static inline void mem_cgroup_ptr_get(struct mem_cgroup_ptr *ptr) +{ + percpu_ref_get(&ptr->refcnt); +} + +static inline void mem_cgroup_ptr_put(struct mem_cgroup_ptr *ptr) +{ + percpu_ref_put(&ptr->refcnt); +} + static inline void mem_cgroup_put(struct mem_cgroup *memcg) { if (memcg) @@ -1435,6 +1469,22 @@ static inline int memcg_cache_id(struct mem_cgroup *memcg) return memcg ? memcg->kmemcg_id : -1; } +static inline struct mem_cgroup_ptr * +mem_cgroup_get_kmem_ptr(struct mem_cgroup *memcg) +{ + struct mem_cgroup_ptr *memcg_ptr; + + rcu_read_lock(); + do { + memcg_ptr = rcu_dereference(memcg->kmem_memcg_ptr); + if (memcg_ptr && mem_cgroup_ptr_tryget(memcg_ptr)) + break; + } while ((memcg = parent_mem_cgroup(memcg))); + rcu_read_unlock(); + + return memcg_ptr; +} + #else static inline int memcg_kmem_charge(struct page *page, gfp_t gfp, int order) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c3bdb8d4c355..cd8ac747827e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -258,6 +258,77 @@ struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr) } #ifdef CONFIG_MEMCG_KMEM +extern spinlock_t css_set_lock; + +static void memcg_ptr_release(struct percpu_ref *ref) +{ + struct mem_cgroup_ptr *ptr = container_of(ref, struct mem_cgroup_ptr, + refcnt); + unsigned long flags; + + spin_lock_irqsave(&css_set_lock, flags); + list_del(&ptr->list); + spin_unlock_irqrestore(&css_set_lock, flags); + + mem_cgroup_put(ptr->memcg); + percpu_ref_exit(ref); + kfree_rcu(ptr, rcu); +} + +static int memcg_init_kmem_memcg_ptr(struct mem_cgroup *memcg) +{ + struct mem_cgroup_ptr *kmem_memcg_ptr; + int ret; + + kmem_memcg_ptr = kmalloc(sizeof(struct mem_cgroup_ptr), GFP_KERNEL); + if (!kmem_memcg_ptr) + return -ENOMEM; + + ret = percpu_ref_init(&kmem_memcg_ptr->refcnt, memcg_ptr_release, + 0, GFP_KERNEL); + if (ret) { + kfree(kmem_memcg_ptr); + return ret; + } + + kmem_memcg_ptr->memcg = memcg; + INIT_LIST_HEAD(&kmem_memcg_ptr->list); + rcu_assign_pointer(memcg->kmem_memcg_ptr, kmem_memcg_ptr); + list_add(&kmem_memcg_ptr->list, &memcg->kmem_memcg_ptr_list); + return 0; +} + +static void memcg_reparent_kmem_memcg_ptr(struct mem_cgroup *memcg, + struct mem_cgroup *parent) +{ + unsigned int nr_reparented = 0; + struct mem_cgroup_ptr *memcg_ptr = NULL; + + rcu_swap_protected(memcg->kmem_memcg_ptr, memcg_ptr, true); + percpu_ref_kill(&memcg_ptr->refcnt); + + /* + * kmem_memcg_ptr is initialized before css refcounter, so until now + * it doesn't hold a reference to the memcg. Bump it here. + */ + css_get(&memcg->css); + + spin_lock_irq(&css_set_lock); + list_for_each_entry(memcg_ptr, &memcg->kmem_memcg_ptr_list, list) { + xchg(&memcg_ptr->memcg, parent); + nr_reparented++; + } + if (nr_reparented) + list_splice(&memcg->kmem_memcg_ptr_list, + &parent->kmem_memcg_ptr_list); + spin_unlock_irq(&css_set_lock); + + if (nr_reparented) { + css_get_many(&parent->css, nr_reparented); + css_put_many(&memcg->css, nr_reparented); + } +} + /* * This will be the memcg's index in each cache's ->memcg_params.memcg_caches. * The main reason for not using cgroup id for this: @@ -3572,7 +3643,7 @@ static void memcg_flush_percpu_vmevents(struct mem_cgroup *memcg) #ifdef CONFIG_MEMCG_KMEM static int memcg_online_kmem(struct mem_cgroup *memcg) { - int memcg_id; + int memcg_id, ret; if (cgroup_memory_nokmem) return 0; @@ -3584,6 +3655,12 @@ static int memcg_online_kmem(struct mem_cgroup *memcg) if (memcg_id < 0) return memcg_id; + ret = memcg_init_kmem_memcg_ptr(memcg); + if (ret) { + memcg_free_cache_id(memcg_id); + return ret; + } + static_branch_inc(&memcg_kmem_enabled_key); /* * A memory cgroup is considered kmem-online as soon as it gets @@ -3619,12 +3696,13 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg) parent = root_mem_cgroup; /* - * Deactivate and reparent kmem_caches. Then flush percpu - * slab statistics to have precise values at the parent and - * all ancestor levels. It's required to keep slab stats + * Deactivate and reparent kmem_caches and reparent kmem_memcg_ptr. + * Then flush percpu slab statistics to have precise values at the + * parent and all ancestor levels. It's required to keep slab stats * accurate after the reparenting of kmem_caches. */ memcg_deactivate_kmem_caches(memcg, parent); + memcg_reparent_kmem_memcg_ptr(memcg, parent); memcg_flush_percpu_vmstats(memcg, true); kmemcg_id = memcg->kmemcg_id; @@ -5180,6 +5258,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void) memcg->socket_pressure = jiffies; #ifdef CONFIG_MEMCG_KMEM memcg->kmemcg_id = -1; + INIT_LIST_HEAD(&memcg->kmem_memcg_ptr_list); #endif #ifdef CONFIG_CGROUP_WRITEBACK INIT_LIST_HEAD(&memcg->cgwb_list); From patchwork Fri Oct 18 00:28:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197379 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 08E6015AB for ; Fri, 18 Oct 2019 00:28:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C0B29222CB for ; Fri, 18 Oct 2019 00:28:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="Jwpxkv3q" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C0B29222CB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3B1E38E0003; Thu, 17 Oct 2019 20:28:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 341268E0007; Thu, 17 Oct 2019 20:28:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 167A58E0005; Thu, 17 Oct 2019 20:28:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0050.hostedemail.com [216.40.44.50]) by kanga.kvack.org (Postfix) with ESMTP id D60B18E0008 for ; Thu, 17 Oct 2019 20:28:38 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 5A8073A97 for ; Fri, 18 Oct 2019 00:28:38 +0000 (UTC) X-FDA: 76055019516.07.song72_4b5fdec8e214e X-Spam-Summary: 2,0,0,db2d2a8520adb001,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1261:1277:1313:1314:1345:1359:1437:1516:1518:1535:1543:1711:1730:1747:1777:1792:2393:2559:2562:2693:3138:3139:3140:3141:3142:3354:3865:3866:3867:3868:3870:3871:3872:4117:4321:4605:5007:6119:6261:6653:7875:7904:9036:10004:10400:11026:11473:11658:11914:12043:12296:12297:12438:12555:12895:12986:13255:14096:14097:14181:14394:14721:21080:21433:21627:21740:30054:30064:30070,0,RBL:67.231.145.42:@fb.com:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: song72_4b5fdec8e214e X-Filterd-Recvd-Size: 6141 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 00:28:37 +0000 (UTC) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0NP81030781 for ; Thu, 17 Oct 2019 17:28:36 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=duc2IEN0723Bf+DfQiHMMxAa0jjXCQi8qfL9v6lQ4f8=; b=Jwpxkv3qQNTm8Wz4uauszEJvnvAyxZf0zjjhE00dQNNJhI6t0lAnWS1E0YxGe9XcDWRm yl4hbhjJRwmcdV9f1PluH8bubiWnkt8rYXxBsnijho8okBcsBIwUCHgMZROHstPkcQ4F 3lEnaMznq4UwysFP9H/tisA6SFVdFKaHaNQ= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2vpj8rvkye-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2019 17:28:36 -0700 Received: from 2401:db00:12:9028:face:0:29:0 (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 17 Oct 2019 17:28:35 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id E4FCA18CE8477; Thu, 17 Oct 2019 17:28:33 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 02/16] mm: vmstat: use s32 for vm_node_stat_diff in struct per_cpu_nodestat Date: Thu, 17 Oct 2019 17:28:06 -0700 Message-ID: <20191018002820.307763-3-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxlogscore=999 clxscore=1015 spamscore=0 impostorscore=0 priorityscore=1501 lowpriorityscore=0 suspectscore=1 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180001 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently s8 type is used for per-cpu caching of per-node statistics. It works fine because the overfill threshold can't exceed 125. But if some counters are in bytes (and the next commit in the series will convert slab counters to bytes), it's not gonna work: value in bytes can easily exceed s8 without exceeding the threshold converted to bytes. So to avoid overfilling per-cpu caches and breaking vmstats correctness, let's use s32 instead. This doesn't affect per-zone statistics. There are no plans to use zone-level byte-sized counters, so no reasons to change anything. Signed-off-by: Roman Gushchin --- include/linux/mmzone.h | 2 +- mm/vmstat.c | 16 ++++++++-------- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d4ca03b93373..4d8adaa5d59c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -353,7 +353,7 @@ struct per_cpu_pageset { struct per_cpu_nodestat { s8 stat_threshold; - s8 vm_node_stat_diff[NR_VM_NODE_STAT_ITEMS]; + s32 vm_node_stat_diff[NR_VM_NODE_STAT_ITEMS]; }; #endif /* !__GENERATING_BOUNDS.H */ diff --git a/mm/vmstat.c b/mm/vmstat.c index b2fd344d2fcf..3375e7f45891 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -337,7 +337,7 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item, long delta) { struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats; - s8 __percpu *p = pcp->vm_node_stat_diff + item; + s32 __percpu *p = pcp->vm_node_stat_diff + item; long x; long t; @@ -395,13 +395,13 @@ void __inc_zone_state(struct zone *zone, enum zone_stat_item item) void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item) { struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats; - s8 __percpu *p = pcp->vm_node_stat_diff + item; - s8 v, t; + s32 __percpu *p = pcp->vm_node_stat_diff + item; + s32 v, t; v = __this_cpu_inc_return(*p); t = __this_cpu_read(pcp->stat_threshold); if (unlikely(v > t)) { - s8 overstep = t >> 1; + s32 overstep = t >> 1; node_page_state_add(v + overstep, pgdat, item); __this_cpu_write(*p, -overstep); @@ -439,13 +439,13 @@ void __dec_zone_state(struct zone *zone, enum zone_stat_item item) void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item) { struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats; - s8 __percpu *p = pcp->vm_node_stat_diff + item; - s8 v, t; + s32 __percpu *p = pcp->vm_node_stat_diff + item; + s32 v, t; v = __this_cpu_dec_return(*p); t = __this_cpu_read(pcp->stat_threshold); if (unlikely(v < - t)) { - s8 overstep = t >> 1; + s32 overstep = t >> 1; node_page_state_add(v - overstep, pgdat, item); __this_cpu_write(*p, overstep); @@ -538,7 +538,7 @@ static inline void mod_node_state(struct pglist_data *pgdat, enum node_stat_item item, int delta, int overstep_mode) { struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats; - s8 __percpu *p = pcp->vm_node_stat_diff + item; + s32 __percpu *p = pcp->vm_node_stat_diff + item; long o, n, t, z; do { From patchwork Fri Oct 18 00:28:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197389 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CEF1B13B1 for ; Fri, 18 Oct 2019 00:28:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 71D5621852 for ; Fri, 18 Oct 2019 00:28:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="rj4LgSTF" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 71D5621852 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4029F8E0005; Thu, 17 Oct 2019 20:28:40 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3898F8E0009; Thu, 17 Oct 2019 20:28:40 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 206B68E0005; Thu, 17 Oct 2019 20:28:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0228.hostedemail.com [216.40.44.228]) by kanga.kvack.org (Postfix) with ESMTP id D2BC58E0009 for ; Thu, 17 Oct 2019 20:28:39 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 771CC3A97 for ; Fri, 18 Oct 2019 00:28:39 +0000 (UTC) X-FDA: 76055019558.03.nail96_4b8480683f739 X-Spam-Summary: 2,0,0,96083fc84a6a37fa,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com,RULES_HIT:41:69:327:355:379:541:800:960:966:973:988:989:1260:1261:1277:1313:1314:1345:1359:1437:1516:1518:1730:1747:1777:1792:2196:2198:2199:2200:2393:2559:2562:2690:2693:2731:2895:2914:3138:3139:3140:3141:3142:3354:3369:3865:3866:3867:3868:3870:3871:3872:3874:4041:4250:4321:4385:4605:5007:6119:6261:6653:7904:8603:8660:8957:9036:10004:10954:11026:11232:11473:11658:11914:12043:12296:12297:12438:12555:12895:12986:13148:13161:13229:13230:14096:14097:14394:14877:21080:21324:21325:21433:21450:21451:21611:21627:21740:21966:30012:30054:30055:30056:30064:30070,0,RBL:67.231.153.30:@fb.com:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neut ral,Cust X-HE-Tag: nail96_4b8480683f739 X-Filterd-Recvd-Size: 21589 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 00:28:38 +0000 (UTC) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0NcwC008470 for ; Thu, 17 Oct 2019 17:28:38 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=lDMfG9E4iIMfu/wUSmuJUiD5Jhprmh1yfQmARapRliw=; b=rj4LgSTFiHzGi8ve2tng28zEC+7b+UIB1XM6BmeACe1Zhgu6QV5YE/EK48GNdVFSG4qh p0IA/HX4jqNo+pkxMM4VWSdAo8keK5Vg6baonCEjKB85YWoO2CReDA06qjPJseIuXlbC uBDdFig0yHbK/trrCi82r7sPlqwQsvdwGB4= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2vpw9r9fae-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2019 17:28:38 -0700 Received: from 2401:db00:30:6007:face:0:1:0 (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 17 Oct 2019 17:28:35 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id E883118CE8479; Thu, 17 Oct 2019 17:28:33 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 03/16] mm: vmstat: convert slab vmstat counter to bytes Date: Thu, 17 Oct 2019 17:28:07 -0700 Message-ID: <20191018002820.307763-4-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxlogscore=999 lowpriorityscore=0 mlxscore=0 suspectscore=3 phishscore=0 clxscore=1015 adultscore=0 spamscore=0 bulkscore=0 priorityscore=1501 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180001 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In order to prepare for per-object slab memory accounting, convert NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE vmstat items to bytes. To make sure that these vmstats are in bytes, rename them to NR_SLAB_RECLAIMABLE_B and NR_SLAB_UNRECLAIMABLE_B (similar to NR_KERNEL_STACK_KB). The size of slab memory shouldn't exceed 4Gb on 32-bit machines, so it will fit into atomic_long_t we use for vmstats. Signed-off-by: Roman Gushchin --- drivers/base/node.c | 14 +++++++++----- fs/proc/meminfo.c | 4 ++-- include/linux/mmzone.h | 10 ++++++++-- include/linux/vmstat.h | 8 ++++++++ kernel/power/snapshot.c | 2 +- mm/memcontrol.c | 29 ++++++++++++++++------------- mm/oom_kill.c | 2 +- mm/page_alloc.c | 8 ++++---- mm/slab.h | 15 ++++++++------- mm/slab_common.c | 4 ++-- mm/slob.c | 12 ++++++------ mm/slub.c | 8 ++++---- mm/vmscan.c | 3 ++- mm/vmstat.c | 21 +++++++++++++++++++-- mm/workingset.c | 6 ++++-- 15 files changed, 94 insertions(+), 52 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 98a31bafc8a2..fa07c5806dcd 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -368,8 +368,8 @@ static ssize_t node_read_meminfo(struct device *dev, unsigned long sreclaimable, sunreclaimable; si_meminfo_node(&i, nid); - sreclaimable = node_page_state(pgdat, NR_SLAB_RECLAIMABLE); - sunreclaimable = node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE); + sreclaimable = node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B); + sunreclaimable = node_page_state_pages(pgdat, NR_SLAB_UNRECLAIMABLE_B); n = sprintf(buf, "Node %d MemTotal: %8lu kB\n" "Node %d MemFree: %8lu kB\n" @@ -505,9 +505,13 @@ static ssize_t node_read_vmstat(struct device *dev, sum_zone_numa_state(nid, i)); #endif - for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) - n += sprintf(buf+n, "%s %lu\n", node_stat_name(i), - node_page_state(pgdat, i)); + for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { + unsigned long x = node_page_state(pgdat, i); + + if (vmstat_item_in_bytes(i)) + x >>= PAGE_SHIFT; + n += sprintf(buf+n, "%s %lu\n", node_stat_name(i), x); + } return n; } diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index ac9247371871..87afa8683c1b 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -53,8 +53,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v) pages[lru] = global_node_page_state(NR_LRU_BASE + lru); available = si_mem_available(); - sreclaimable = global_node_page_state(NR_SLAB_RECLAIMABLE); - sunreclaim = global_node_page_state(NR_SLAB_UNRECLAIMABLE); + sreclaimable = global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B); + sunreclaim = global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B); show_val_kb(m, "MemTotal: ", i.totalram); show_val_kb(m, "MemFree: ", i.freeram); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 4d8adaa5d59c..e20c200bff95 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -215,8 +215,8 @@ enum node_stat_item { NR_INACTIVE_FILE, /* " " " " " */ NR_ACTIVE_FILE, /* " " " " " */ NR_UNEVICTABLE, /* " " " " " */ - NR_SLAB_RECLAIMABLE, /* Please do not reorder this item */ - NR_SLAB_UNRECLAIMABLE, /* and this one without looking at + NR_SLAB_RECLAIMABLE_B, /* Please do not reorder this item */ + NR_SLAB_UNRECLAIMABLE_B,/* and this one without looking at * memcg_flush_percpu_vmstats() first. */ NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ @@ -247,6 +247,12 @@ enum node_stat_item { NR_VM_NODE_STAT_ITEMS }; +static __always_inline bool vmstat_item_in_bytes(enum node_stat_item item) +{ + return (item == NR_SLAB_RECLAIMABLE_B || + item == NR_SLAB_UNRECLAIMABLE_B); +} + /* * We do arithmetic on the LRU lists in various places in the code, * so it is important to keep the active lists LRU_ACTIVE higher in diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 292485f3d24d..5b0f61b3ca75 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -200,6 +200,12 @@ static inline unsigned long global_node_page_state(enum node_stat_item item) return x; } +static inline +unsigned long global_node_page_state_pages(enum node_stat_item item) +{ + return global_node_page_state(item) >> PAGE_SHIFT; +} + static inline unsigned long zone_page_state(struct zone *zone, enum zone_stat_item item) { @@ -240,6 +246,8 @@ extern unsigned long sum_zone_node_page_state(int node, extern unsigned long sum_zone_numa_state(int node, enum numa_stat_item item); extern unsigned long node_page_state(struct pglist_data *pgdat, enum node_stat_item item); +extern unsigned long node_page_state_pages(struct pglist_data *pgdat, + enum node_stat_item item); #else #define sum_zone_node_page_state(node, item) global_zone_page_state(item) #define node_page_state(node, item) global_node_page_state(item) diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index 26b9168321e7..d3911f75c39b 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -1666,7 +1666,7 @@ static unsigned long minimum_image_size(unsigned long saveable) { unsigned long size; - size = global_node_page_state(NR_SLAB_RECLAIMABLE) + size = global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B) + global_node_page_state(NR_ACTIVE_ANON) + global_node_page_state(NR_INACTIVE_ANON) + global_node_page_state(NR_ACTIVE_FILE) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index cd8ac747827e..9303e98b0718 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -751,13 +751,16 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) */ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val) { - long x; + long x, threshold = MEMCG_CHARGE_BATCH; if (mem_cgroup_disabled()) return; + if (vmstat_item_in_bytes(idx)) + threshold <<= PAGE_SHIFT; + x = val + __this_cpu_read(memcg->vmstats_percpu->stat[idx]); - if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { + if (unlikely(abs(x) > threshold)) { struct mem_cgroup *mi; /* @@ -799,7 +802,7 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, pg_data_t *pgdat = lruvec_pgdat(lruvec); struct mem_cgroup_per_node *pn; struct mem_cgroup *memcg; - long x; + long x, threshold = MEMCG_CHARGE_BATCH; /* Update node */ __mod_node_page_state(pgdat, idx, val); @@ -816,8 +819,11 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, /* Update lruvec */ __this_cpu_add(pn->lruvec_stat_local->count[idx], val); + if (vmstat_item_in_bytes(idx)) + threshold <<= PAGE_SHIFT; + x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]); - if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { + if (unlikely(abs(x) > threshold)) { struct mem_cgroup_per_node *pi; for (pi = pn; pi; pi = parent_nodeinfo(pi, pgdat->node_id)) @@ -1466,9 +1472,8 @@ static char *memory_stat_format(struct mem_cgroup *memcg) (u64)memcg_page_state(memcg, MEMCG_KERNEL_STACK_KB) * 1024); seq_buf_printf(&s, "slab %llu\n", - (u64)(memcg_page_state(memcg, NR_SLAB_RECLAIMABLE) + - memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE)) * - PAGE_SIZE); + (u64)(memcg_page_state(memcg, NR_SLAB_RECLAIMABLE_B) + + memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE_B))); seq_buf_printf(&s, "sock %llu\n", (u64)memcg_page_state(memcg, MEMCG_SOCK) * PAGE_SIZE); @@ -1502,11 +1507,9 @@ static char *memory_stat_format(struct mem_cgroup *memcg) PAGE_SIZE); seq_buf_printf(&s, "slab_reclaimable %llu\n", - (u64)memcg_page_state(memcg, NR_SLAB_RECLAIMABLE) * - PAGE_SIZE); + (u64)memcg_page_state(memcg, NR_SLAB_RECLAIMABLE_B)); seq_buf_printf(&s, "slab_unreclaimable %llu\n", - (u64)memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE) * - PAGE_SIZE); + (u64)memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE_B)); /* Accumulated memory events */ @@ -3582,8 +3585,8 @@ static void memcg_flush_percpu_vmstats(struct mem_cgroup *memcg, bool slab_only) int min_idx, max_idx; if (slab_only) { - min_idx = NR_SLAB_RECLAIMABLE; - max_idx = NR_SLAB_UNRECLAIMABLE; + min_idx = NR_SLAB_RECLAIMABLE_B; + max_idx = NR_SLAB_UNRECLAIMABLE_B; } else { min_idx = 0; max_idx = MEMCG_NR_STAT; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 314ce1a3cf25..61476bf0f147 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -183,7 +183,7 @@ static bool is_dump_unreclaim_slabs(void) global_node_page_state(NR_ISOLATED_FILE) + global_node_page_state(NR_UNEVICTABLE); - return (global_node_page_state(NR_SLAB_UNRECLAIMABLE) > nr_lru); + return (global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B) > nr_lru); } /** diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cd1dd0712624..c037bdaa73a1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5130,8 +5130,8 @@ long si_mem_available(void) * items that are in use, and cannot be freed. Cap this estimate at the * low watermark. */ - reclaimable = global_node_page_state(NR_SLAB_RECLAIMABLE) + - global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE); + reclaimable = global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B) + + global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE); available += reclaimable - min(reclaimable / 2, wmark_low); if (available < 0) @@ -5275,8 +5275,8 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) global_node_page_state(NR_FILE_DIRTY), global_node_page_state(NR_WRITEBACK), global_node_page_state(NR_UNSTABLE_NFS), - global_node_page_state(NR_SLAB_RECLAIMABLE), - global_node_page_state(NR_SLAB_UNRECLAIMABLE), + global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B), + global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B), global_node_page_state(NR_FILE_MAPPED), global_node_page_state(NR_SHMEM), global_zone_page_state(NR_PAGETABLE), diff --git a/mm/slab.h b/mm/slab.h index 3eb29ae75743..03833b02b9ae 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -272,7 +272,7 @@ int __kmem_cache_alloc_bulk(struct kmem_cache *, gfp_t, size_t, void **); static inline int cache_vmstat_idx(struct kmem_cache *s) { return (s->flags & SLAB_RECLAIM_ACCOUNT) ? - NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE; + NR_SLAB_RECLAIMABLE_B : NR_SLAB_UNRECLAIMABLE_B; } #ifdef CONFIG_MEMCG_KMEM @@ -360,7 +360,7 @@ static __always_inline int memcg_charge_slab(struct page *page, if (unlikely(!memcg || mem_cgroup_is_root(memcg))) { mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - (1 << order)); + (PAGE_SIZE << order)); percpu_ref_get_many(&s->memcg_params.refcnt, 1 << order); return 0; } @@ -370,7 +370,7 @@ static __always_inline int memcg_charge_slab(struct page *page, goto out; lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg); - mod_lruvec_state(lruvec, cache_vmstat_idx(s), 1 << order); + mod_lruvec_state(lruvec, cache_vmstat_idx(s), PAGE_SIZE << order); /* transer try_charge() page references to kmem_cache */ percpu_ref_get_many(&s->memcg_params.refcnt, 1 << order); @@ -394,11 +394,12 @@ static __always_inline void memcg_uncharge_slab(struct page *page, int order, memcg = READ_ONCE(s->memcg_params.memcg); if (likely(!mem_cgroup_is_root(memcg))) { lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg); - mod_lruvec_state(lruvec, cache_vmstat_idx(s), -(1 << order)); + mod_lruvec_state(lruvec, cache_vmstat_idx(s), + -(PAGE_SIZE << order)); memcg_kmem_uncharge_memcg(page, order, memcg); } else { mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - -(1 << order)); + -(PAGE_SIZE << order)); } rcu_read_unlock(); @@ -482,7 +483,7 @@ static __always_inline int charge_slab_page(struct page *page, { if (is_root_cache(s)) { mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - 1 << order); + PAGE_SIZE << order); return 0; } @@ -494,7 +495,7 @@ static __always_inline void uncharge_slab_page(struct page *page, int order, { if (is_root_cache(s)) { mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - -(1 << order)); + -(PAGE_SIZE << order)); return; } diff --git a/mm/slab_common.c b/mm/slab_common.c index 8afa188f6e20..f0f7f955c5fa 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1311,8 +1311,8 @@ void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) page = alloc_pages(flags, order); if (likely(page)) { ret = page_address(page); - mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE, - 1 << order); + mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE_B, + PAGE_SIZE << order); } ret = kasan_kmalloc_large(ret, size, flags); /* As ret might get tagged, call kmemleak hook after KASAN. */ diff --git a/mm/slob.c b/mm/slob.c index fa53e9f73893..8b7b56235438 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -202,8 +202,8 @@ static void *slob_new_pages(gfp_t gfp, int order, int node) if (!page) return NULL; - mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE, - 1 << order); + mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE_B, + PAGE_SIZE << order); return page_address(page); } @@ -214,8 +214,8 @@ static void slob_free_pages(void *b, int order) if (current->reclaim_state) current->reclaim_state->reclaimed_slab += 1 << order; - mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE, - -(1 << order)); + mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B, + -(PAGE_SIZE << order)); __free_pages(sp, order); } @@ -550,8 +550,8 @@ void kfree(const void *block) slob_free(m, *m + align); } else { unsigned int order = compound_order(sp); - mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE, - -(1 << order)); + mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B, + -(PAGE_SIZE << order)); __free_pages(sp, order); } diff --git a/mm/slub.c b/mm/slub.c index 249e4c8be66a..bd902d65a71c 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3838,8 +3838,8 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node) page = alloc_pages_node(node, flags, order); if (page) { ptr = page_address(page); - mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE, - 1 << order); + mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE_B, + PAGE_SIZE << order); } return kmalloc_large_node_hook(ptr, size, flags); @@ -3970,8 +3970,8 @@ void kfree(const void *x) BUG_ON(!PageCompound(page)); kfree_hook(object); - mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE, - -(1 << order)); + mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE_B, + -(PAGE_SIZE << order)); __free_pages(page, order); return; } diff --git a/mm/vmscan.c b/mm/vmscan.c index 50c81463797b..990e4b455b7d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4272,7 +4272,8 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order) * unmapped file backed pages. */ if (node_pagecache_reclaimable(pgdat) <= pgdat->min_unmapped_pages && - node_page_state(pgdat, NR_SLAB_RECLAIMABLE) <= pgdat->min_slab_pages) + node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B) <= + pgdat->min_slab_pages) return NODE_RECLAIM_FULL; /* diff --git a/mm/vmstat.c b/mm/vmstat.c index 3375e7f45891..1d10ffcecb9f 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -344,6 +344,8 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item, x = delta + __this_cpu_read(*p); t = __this_cpu_read(pcp->stat_threshold); + if (vmstat_item_in_bytes(item)) + t <<= PAGE_SHIFT; if (unlikely(x > t || x < -t)) { node_page_state_add(x, pgdat, item); @@ -555,6 +557,8 @@ static inline void mod_node_state(struct pglist_data *pgdat, * for all cpus in a node. */ t = this_cpu_read(pcp->stat_threshold); + if (vmstat_item_in_bytes(item)) + t <<= PAGE_SHIFT; o = this_cpu_read(*p); n = delta + o; @@ -999,6 +1003,12 @@ unsigned long node_page_state(struct pglist_data *pgdat, #endif return x; } + +unsigned long node_page_state_pages(struct pglist_data *pgdat, + enum node_stat_item item) +{ + return node_page_state(pgdat, item) >> PAGE_SHIFT; +} #endif #ifdef CONFIG_COMPACTION @@ -1548,8 +1558,12 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, if (is_zone_first_populated(pgdat, zone)) { seq_printf(m, "\n per-node stats"); for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { + unsigned long x = node_page_state(pgdat, i); + + if (vmstat_item_in_bytes(i)) + x >>= PAGE_SHIFT; seq_printf(m, "\n %-12s %lu", node_stat_name(i), - node_page_state(pgdat, i)); + x); } } seq_printf(m, @@ -1671,8 +1685,11 @@ static void *vmstat_start(struct seq_file *m, loff_t *pos) v += NR_VM_NUMA_STAT_ITEMS; #endif - for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) + for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { v[i] = global_node_page_state(i); + if (vmstat_item_in_bytes(i)) + v[i] >>= PAGE_SHIFT; + } v += NR_VM_NODE_STAT_ITEMS; global_dirty_limits(v + NR_DIRTY_BG_THRESHOLD, diff --git a/mm/workingset.c b/mm/workingset.c index c963831d354f..675792387e58 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -430,8 +430,10 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker, for (pages = 0, i = 0; i < NR_LRU_LISTS; i++) pages += lruvec_page_state_local(lruvec, NR_LRU_BASE + i); - pages += lruvec_page_state_local(lruvec, NR_SLAB_RECLAIMABLE); - pages += lruvec_page_state_local(lruvec, NR_SLAB_UNRECLAIMABLE); + pages += lruvec_page_state_local( + lruvec, NR_SLAB_RECLAIMABLE_B) >> PAGE_SHIFT; + pages += lruvec_page_state_local( + lruvec, NR_SLAB_UNRECLAIMABLE_B) >> PAGE_SHIFT; } else #endif pages = node_present_pages(sc->nid); From patchwork Fri Oct 18 00:28:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197399 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 864DD15AB for ; Fri, 18 Oct 2019 00:29:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 47A2F21D80 for ; Fri, 18 Oct 2019 00:29:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="MGrwpPYX" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 47A2F21D80 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7F86B8E0011; Thu, 17 Oct 2019 20:28:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 781F88E000D; Thu, 17 Oct 2019 20:28:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 627238E0011; Thu, 17 Oct 2019 20:28:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0168.hostedemail.com [216.40.44.168]) by kanga.kvack.org (Postfix) with ESMTP id 3A7588E0010 for ; Thu, 17 Oct 2019 20:28:42 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id C922318021CBD for ; Fri, 18 Oct 2019 00:28:41 +0000 (UTC) X-FDA: 76055019642.05.scent03_4bdf90c8db660 X-Spam-Summary: 2,0,0,1afbd8aab2553813,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com,RULES_HIT:41:355:379:541:800:960:966:973:988:989:1260:1261:1277:1313:1314:1345:1359:1437:1516:1518:1535:1544:1711:1730:1747:1777:1792:2196:2198:2199:2200:2393:2559:2562:2693:2731:3138:3139:3140:3141:3142:3355:3865:3866:3867:3868:3870:3871:3872:3874:4118:4321:4385:4605:5007:6119:6261:6653:7903:10004:11026:11233:11658:11914:12043:12291:12296:12297:12438:12555:12683:12895:13138:13161:13229:13231:13255:14096:14097:14181:14394:14721:21080:21450:21627:30054:30064,0,RBL:67.231.145.42:@fb.com:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: scent03_4bdf90c8db660 X-Filterd-Recvd-Size: 7015 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 00:28:41 +0000 (UTC) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0NP8C030781 for ; Thu, 17 Oct 2019 17:28:40 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=pvgXXdTrCLKbYDxethJ96kH3nJKjLK66UDI+XKz1Op0=; b=MGrwpPYXjF9zWAfew+3Ui97Ef/fN4c2eCMaxmjTGTnJjlBSdiL1UPcF/haIaUGmjNfwV hTE7zbYQgDMxGW0l8B4i8vOwHl6xsQk7phVB9W1ONKMkdB8nBiZiK9fWbe8q1dgqv44h ivopRS8qIwx/YpVrylFgGBSegrA+v2sNOfM= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2vpj8rvkye-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2019 17:28:40 -0700 Received: from 2401:db00:30:600c:face:0:1f:0 (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 17 Oct 2019 17:28:35 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id EC87218CE847B; Thu, 17 Oct 2019 17:28:33 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 04/16] mm: memcg/slab: allocate space for memcg ownership data for non-root slabs Date: Thu, 17 Oct 2019 17:28:08 -0700 Message-ID: <20191018002820.307763-5-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxlogscore=778 clxscore=1015 spamscore=0 impostorscore=0 priorityscore=1501 lowpriorityscore=0 suspectscore=3 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180001 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Allocate and release memory for storing the memcg ownership data. For each slab page allocate space sufficient for number_of_objects pointers to struct mem_cgroup_vec. The mem_cgroup field of the struct page isn't used for slab pages, so let's use the space for storing the pointer for the allocated space. This commit makes sure that the space is ready for use, but nobody is actually using it yet. Following commits in the series will fix it. Signed-off-by: Roman Gushchin --- include/linux/mm_types.h | 5 ++++- mm/slab.c | 3 ++- mm/slab.h | 37 ++++++++++++++++++++++++++++++++++++- mm/slub.c | 2 +- 4 files changed, 43 insertions(+), 4 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 2222fa795284..4d99ee5a9c53 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -198,7 +198,10 @@ struct page { atomic_t _refcount; #ifdef CONFIG_MEMCG - struct mem_cgroup *mem_cgroup; + union { + struct mem_cgroup *mem_cgroup; + struct mem_cgroup_ptr **mem_cgroup_vec; + }; #endif /* diff --git a/mm/slab.c b/mm/slab.c index f1e1840af533..ffa16dd966ef 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1370,7 +1370,8 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, return NULL; } - if (charge_slab_page(page, flags, cachep->gfporder, cachep)) { + if (charge_slab_page(page, flags, cachep->gfporder, cachep, + cachep->num)) { __free_pages(page, cachep->gfporder); return NULL; } diff --git a/mm/slab.h b/mm/slab.h index 03833b02b9ae..8620a0a1d5fa 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -406,6 +406,23 @@ static __always_inline void memcg_uncharge_slab(struct page *page, int order, percpu_ref_put_many(&s->memcg_params.refcnt, 1 << order); } +static inline int memcg_alloc_page_memcg_vec(struct page *page, gfp_t gfp, + unsigned int objects) +{ + page->mem_cgroup_vec = kmalloc(sizeof(struct mem_cgroup_ptr *) * + objects, gfp | __GFP_ZERO); + if (!page->mem_cgroup_vec) + return -ENOMEM; + + return 0; +} + +static inline void memcg_free_page_memcg_vec(struct page *page) +{ + kfree(page->mem_cgroup_vec); + page->mem_cgroup_vec = NULL; +} + extern void slab_init_memcg_params(struct kmem_cache *); extern void memcg_link_cache(struct kmem_cache *s, struct mem_cgroup *memcg); @@ -455,6 +472,16 @@ static inline void memcg_uncharge_slab(struct page *page, int order, { } +static inline int memcg_alloc_page_memcg_vec(struct page *page, gfp_t gfp, + unsigned int objects) +{ + return 0; +} + +static inline void memcg_free_page_memcg_vec(struct page *page) +{ +} + static inline void slab_init_memcg_params(struct kmem_cache *s) { } @@ -479,14 +506,21 @@ static inline struct kmem_cache *virt_to_cache(const void *obj) static __always_inline int charge_slab_page(struct page *page, gfp_t gfp, int order, - struct kmem_cache *s) + struct kmem_cache *s, + unsigned int objects) { + int ret; + if (is_root_cache(s)) { mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), PAGE_SIZE << order); return 0; } + ret = memcg_alloc_page_memcg_vec(page, gfp, objects); + if (ret) + return ret; + return memcg_charge_slab(page, gfp, order, s); } @@ -499,6 +533,7 @@ static __always_inline void uncharge_slab_page(struct page *page, int order, return; } + memcg_free_page_memcg_vec(page); memcg_uncharge_slab(page, order, s); } diff --git a/mm/slub.c b/mm/slub.c index bd902d65a71c..e810582f5b86 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1518,7 +1518,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s, else page = __alloc_pages_node(node, flags, order); - if (page && charge_slab_page(page, flags, order, s)) { + if (page && charge_slab_page(page, flags, order, s, oo_objects(oo))) { __free_pages(page, order); page = NULL; } From patchwork Fri Oct 18 00:28:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197383 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2EFE313B1 for ; Fri, 18 Oct 2019 00:28:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F054621852 for ; Fri, 18 Oct 2019 00:28:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="a0Ye3dCf" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F054621852 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 959398E000A; Thu, 17 Oct 2019 20:28:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8E3308E0005; Thu, 17 Oct 2019 20:28:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7899E8E0009; Thu, 17 Oct 2019 20:28:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0247.hostedemail.com [216.40.44.247]) by kanga.kvack.org (Postfix) with ESMTP id 43ED18E0005 for ; Thu, 17 Oct 2019 20:28:39 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id D4858440E for ; Fri, 18 Oct 2019 00:28:38 +0000 (UTC) X-FDA: 76055019516.23.lunch66_4b754c407e418 X-Spam-Summary: 2,0,0,86f1d7b9baff871e,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com,RULES_HIT:41:355:379:541:800:960:966:973:988:989:1260:1261:1277:1313:1314:1345:1359:1437:1516:1518:1534:1542:1711:1730:1747:1777:1792:2196:2198:2199:2200:2393:2538:2559:2562:2693:2731:3138:3139:3140:3141:3142:3353:3865:3866:3867:3868:3870:3872:4250:4321:4385:4605:5007:6261:6653:7875:8603:10004:10400:11026:11473:11658:11914:12043:12296:12297:12438:12555:12895:12986:13141:13230:14096:14097:14181:14394:14721:21080:21450:21451:21627:30034:30054:30064,0,RBL:67.231.153.30:@fb.com:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: lunch66_4b754c407e418 X-Filterd-Recvd-Size: 4978 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 00:28:38 +0000 (UTC) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0OM31013489 for ; Thu, 17 Oct 2019 17:28:38 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=XjP6Hsf0nCO9KA2kBEt8lNgVHBpEkM8eL2jAQ11uV7M=; b=a0Ye3dCf0syr/dxXLC3ywDL2waIFIb3ACX1tW45HtrO/O1pE35hQCn1Pvmxbo4sSyxSk Rd5nDaGFIE8CyICmU6iDcZ6cUuQqJEehKzEFbZMZNgYqLYqY6umZ56Es0mOzR10ZfVxe 9bu7hJEz9+T7ZL85gWrV/DNEQXGL2vg026s= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2vpufxt2ku-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2019 17:28:37 -0700 Received: from 2401:db00:12:9028:face:0:29:0 (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 17 Oct 2019 17:28:35 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id EF6B718CE847D; Thu, 17 Oct 2019 17:28:33 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 05/16] mm: slub: implement SLUB version of obj_to_index() Date: Thu, 17 Oct 2019 17:28:09 -0700 Message-ID: <20191018002820.307763-6-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxscore=0 phishscore=0 lowpriorityscore=0 suspectscore=3 clxscore=1015 bulkscore=0 spamscore=0 adultscore=0 mlxlogscore=869 impostorscore=0 malwarescore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180001 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This commit implements SLUB version of the obj_to_index() function, which will be required to calculate the offset of memcg_ptr in the mem_cgroup_vec to store/obtain the memcg ownership data. To make it faster, let's repeat the SLAB's trick introduced by commit 6a2d7a955d8d ("[PATCH] SLAB: use a multiply instead of a divide in obj_to_index()") and avoid an expensive division. Signed-off-by: Roman Gushchin --- include/linux/slub_def.h | 9 +++++++++ mm/slub.c | 1 + 2 files changed, 10 insertions(+) diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index d2153789bd9f..200ea292f250 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -8,6 +8,7 @@ * (C) 2007 SGI, Christoph Lameter */ #include +#include enum stat_item { ALLOC_FASTPATH, /* Allocation from cpu slab */ @@ -86,6 +87,7 @@ struct kmem_cache { unsigned long min_partial; unsigned int size; /* The size of an object including metadata */ unsigned int object_size;/* The size of an object without metadata */ + struct reciprocal_value reciprocal_size; unsigned int offset; /* Free pointer offset */ #ifdef CONFIG_SLUB_CPU_PARTIAL /* Number of per cpu partial objects to keep around */ @@ -182,4 +184,11 @@ static inline void *nearest_obj(struct kmem_cache *cache, struct page *page, return result; } +static inline unsigned int obj_to_index(const struct kmem_cache *cache, + const struct page *page, void *obj) +{ + return reciprocal_divide(kasan_reset_tag(obj) - page_address(page), + cache->reciprocal_size); +} + #endif /* _LINUX_SLUB_DEF_H */ diff --git a/mm/slub.c b/mm/slub.c index e810582f5b86..557ea45a5d75 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3600,6 +3600,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) */ size = ALIGN(size, s->align); s->size = size; + s->reciprocal_size = reciprocal_value(size); if (forced_order >= 0) order = forced_order; else From patchwork Fri Oct 18 00:28:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197395 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B1FC315AB for ; Fri, 18 Oct 2019 00:29:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6378A21852 for ; Fri, 18 Oct 2019 00:29:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="qwYPTLcs" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6378A21852 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 95A078E000F; Thu, 17 Oct 2019 20:28:41 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 893EF8E000C; Thu, 17 Oct 2019 20:28:41 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 737778E000D; Thu, 17 Oct 2019 20:28:41 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0196.hostedemail.com [216.40.44.196]) by kanga.kvack.org (Postfix) with ESMTP id 49A1D8E000C for ; Thu, 17 Oct 2019 20:28:41 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id D121F180ACF7F for ; Fri, 18 Oct 2019 00:28:40 +0000 (UTC) X-FDA: 76055019600.20.glass16_4bba38f1df737 X-Spam-Summary: 2,0,0,12ba1aa24d25bf73,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com,RULES_HIT:1:41:355:379:541:800:960:966:973:988:989:1260:1261:1277:1313:1314:1345:1359:1437:1516:1518:1605:1730:1747:1777:1792:2196:2199:2393:2559:2562:2636:2693:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4321:4385:4605:5007:6261:6653:8603:8660:10004:11026:11473:11658:11914:12043:12291:12296:12297:12438:12555:12679:12683:12895:12986:13148:13161:13166:13229:13230:13255:14096:14097:14394:21080:21222:21433:21450:21451:21611:21627:21796:30012:30036:30054:30064,0,RBL:67.231.153.30:@fb.com:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: glass16_4bba38f1df737 X-Filterd-Recvd-Size: 13158 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 00:28:40 +0000 (UTC) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0NcwH008470 for ; Thu, 17 Oct 2019 17:28:39 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=PMvzZAVlnwMaczmEY/Xl2pjy0ME9ZJtAnIB9wEcMb28=; b=qwYPTLcsLmTdnG7kgN4t+RcEtCRJ6GGNb4wbqSkMDhRYteQD4L/66TqanQy64kgCY+L0 3EBmmtTCWSkFk2elTT2yHgVEZZ0bQ4PfjJ56U634Vdpg9EEvxh+fCVn7LbJS/01ZPxf8 8ddDme2ivz+kIyGuUUhLpTkFgl0+XYKmN14= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2vpw9r9fae-10 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2019 17:28:39 -0700 Received: from 2401:db00:30:600c:face:0:39:0 (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 17 Oct 2019 17:28:35 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id F291418CE847F; Thu, 17 Oct 2019 17:28:33 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 06/16] mm: memcg/slab: save memcg ownership data for non-root slab objects Date: Thu, 17 Oct 2019 17:28:10 -0700 Message-ID: <20191018002820.307763-7-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxlogscore=999 lowpriorityscore=0 mlxscore=0 suspectscore=3 phishscore=0 clxscore=1015 adultscore=0 spamscore=0 bulkscore=0 priorityscore=1501 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180001 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Store a memcg_ptr in the corresponding place of the mem_cgroup_vec for each allocated non-root slab object. Make sure that each allocated object holds a reference to the mem_cgroup_ptr. To get the memcg_ptr in the post alloc hook, we need a memcg pointer. Because all memory cgroup will soon share the same set of kmem_caches, let's not use the kmem_cache->memcg_params.memcg. Instead, let's pass the pointer directly from memcg_kmem_get_cache(). This will guarantee that we will use the same cgroup in pre- and post-alloc hooks. Please, note that the code is a bit bulky now, because we have to manage 3 types of objects with reference counters: memcg, kmem_cache and memcg_ptr. The following commits in the series will simplify it. Signed-off-by: Roman Gushchin --- include/linux/memcontrol.h | 3 +- mm/memcontrol.c | 8 +++-- mm/slab.c | 18 ++++++----- mm/slab.h | 64 ++++++++++++++++++++++++++++++++++---- mm/slub.c | 14 ++++++--- 5 files changed, 86 insertions(+), 21 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index da864fded297..f4cb844005a5 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1397,7 +1397,8 @@ static inline void memcg_set_shrinker_bit(struct mem_cgroup *memcg, } #endif -struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep); +struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep, + struct mem_cgroup **memcgp); void memcg_kmem_put_cache(struct kmem_cache *cachep); #ifdef CONFIG_MEMCG_KMEM diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9303e98b0718..47a30db94869 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3023,7 +3023,8 @@ static inline bool memcg_kmem_bypass(void) * done with it, memcg_kmem_put_cache() must be called to release the * reference. */ -struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep) +struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep, + struct mem_cgroup **memcgp) { struct mem_cgroup *memcg; struct kmem_cache *memcg_cachep; @@ -3079,8 +3080,11 @@ struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep) */ if (unlikely(!memcg_cachep)) memcg_schedule_kmem_cache_create(memcg, cachep); - else if (percpu_ref_tryget(&memcg_cachep->memcg_params.refcnt)) + else if (percpu_ref_tryget(&memcg_cachep->memcg_params.refcnt)) { + css_get(&memcg->css); + *memcgp = memcg; cachep = memcg_cachep; + } out_unlock: rcu_read_unlock(); return cachep; diff --git a/mm/slab.c b/mm/slab.c index ffa16dd966ef..91cd8bc4ee07 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3223,9 +3223,10 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, unsigned long save_flags; void *ptr; int slab_node = numa_mem_id(); + struct mem_cgroup *memcg = NULL; flags &= gfp_allowed_mask; - cachep = slab_pre_alloc_hook(cachep, flags); + cachep = slab_pre_alloc_hook(cachep, &memcg, 1, flags); if (unlikely(!cachep)) return NULL; @@ -3261,7 +3262,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, if (unlikely(slab_want_init_on_alloc(flags, cachep)) && ptr) memset(ptr, 0, cachep->object_size); - slab_post_alloc_hook(cachep, flags, 1, &ptr); + slab_post_alloc_hook(cachep, memcg, flags, 1, &ptr); return ptr; } @@ -3302,9 +3303,10 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller) { unsigned long save_flags; void *objp; + struct mem_cgroup *memcg = NULL; flags &= gfp_allowed_mask; - cachep = slab_pre_alloc_hook(cachep, flags); + cachep = slab_pre_alloc_hook(cachep, &memcg, 1, flags); if (unlikely(!cachep)) return NULL; @@ -3318,7 +3320,7 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller) if (unlikely(slab_want_init_on_alloc(flags, cachep)) && objp) memset(objp, 0, cachep->object_size); - slab_post_alloc_hook(cachep, flags, 1, &objp); + slab_post_alloc_hook(cachep, memcg, flags, 1, &objp); return objp; } @@ -3440,6 +3442,7 @@ void ___cache_free(struct kmem_cache *cachep, void *objp, memset(objp, 0, cachep->object_size); kmemleak_free_recursive(objp, cachep->flags); objp = cache_free_debugcheck(cachep, objp, caller); + memcg_slab_free_hook(cachep, virt_to_head_page(objp), objp); /* * Skip calling cache_free_alien() when the platform is not numa. @@ -3505,8 +3508,9 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p) { size_t i; + struct mem_cgroup *memcg = NULL; - s = slab_pre_alloc_hook(s, flags); + s = slab_pre_alloc_hook(s, &memcg, size, flags); if (!s) return 0; @@ -3529,13 +3533,13 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, for (i = 0; i < size; i++) memset(p[i], 0, s->object_size); - slab_post_alloc_hook(s, flags, size, p); + slab_post_alloc_hook(s, memcg, flags, size, p); /* FIXME: Trace call missing. Christoph would like a bulk variant */ return size; error: local_irq_enable(); cache_alloc_debugcheck_after_bulk(s, flags, i, p, _RET_IP_); - slab_post_alloc_hook(s, flags, i, p); + slab_post_alloc_hook(s, memcg, flags, i, p); __kmem_cache_free_bulk(s, i, p); return 0; } diff --git a/mm/slab.h b/mm/slab.h index 8620a0a1d5fa..28feabed1e9a 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -423,6 +423,45 @@ static inline void memcg_free_page_memcg_vec(struct page *page) page->mem_cgroup_vec = NULL; } +static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, + struct mem_cgroup *memcg, + size_t size, void **p) +{ + struct mem_cgroup_ptr *memcg_ptr; + struct page *page; + unsigned long off; + size_t i; + + memcg_ptr = mem_cgroup_get_kmem_ptr(memcg); + for (i = 0; i < size; i++) { + if (likely(p[i])) { + page = virt_to_head_page(p[i]); + off = obj_to_index(s, page, p[i]); + mem_cgroup_ptr_get(memcg_ptr); + page->mem_cgroup_vec[off] = memcg_ptr; + } + } + mem_cgroup_ptr_put(memcg_ptr); + mem_cgroup_put(memcg); + + memcg_kmem_put_cache(s); +} + +static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page, + void *p) +{ + struct mem_cgroup_ptr *memcg_ptr; + unsigned int off; + + if (!memcg_kmem_enabled() || is_root_cache(s)) + return; + + off = obj_to_index(s, page, p); + memcg_ptr = page->mem_cgroup_vec[off]; + page->mem_cgroup_vec[off] = NULL; + mem_cgroup_ptr_put(memcg_ptr); +} + extern void slab_init_memcg_params(struct kmem_cache *); extern void memcg_link_cache(struct kmem_cache *s, struct mem_cgroup *memcg); @@ -482,6 +521,17 @@ static inline void memcg_free_page_memcg_vec(struct page *page) { } +static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, + struct mem_cgroup *memcg, + size_t size, void **p) +{ +} + +static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page, + void *p) +{ +} + static inline void slab_init_memcg_params(struct kmem_cache *s) { } @@ -591,7 +641,8 @@ static inline size_t slab_ksize(const struct kmem_cache *s) } static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, - gfp_t flags) + struct mem_cgroup **memcgp, + size_t size, gfp_t flags) { flags &= gfp_allowed_mask; @@ -605,13 +656,14 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, if (memcg_kmem_enabled() && ((flags & __GFP_ACCOUNT) || (s->flags & SLAB_ACCOUNT))) - return memcg_kmem_get_cache(s); + return memcg_kmem_get_cache(s, memcgp); return s; } -static inline void slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags, - size_t size, void **p) +static inline void slab_post_alloc_hook(struct kmem_cache *s, + struct mem_cgroup *memcg, + gfp_t flags, size_t size, void **p) { size_t i; @@ -623,8 +675,8 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags, s->flags, flags); } - if (memcg_kmem_enabled()) - memcg_kmem_put_cache(s); + if (!is_root_cache(s)) + memcg_slab_post_alloc_hook(s, memcg, size, p); } #ifndef CONFIG_SLOB diff --git a/mm/slub.c b/mm/slub.c index 557ea45a5d75..a62545c7acac 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2700,8 +2700,9 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct kmem_cache_cpu *c; struct page *page; unsigned long tid; + struct mem_cgroup *memcg = NULL; - s = slab_pre_alloc_hook(s, gfpflags); + s = slab_pre_alloc_hook(s, &memcg, 1, gfpflags); if (!s) return NULL; redo: @@ -2777,7 +2778,7 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, if (unlikely(slab_want_init_on_alloc(gfpflags, s)) && object) memset(object, 0, s->object_size); - slab_post_alloc_hook(s, gfpflags, 1, &object); + slab_post_alloc_hook(s, memcg, gfpflags, 1, &object); return object; } @@ -2982,6 +2983,8 @@ static __always_inline void do_slab_free(struct kmem_cache *s, void *tail_obj = tail ? : head; struct kmem_cache_cpu *c; unsigned long tid; + + memcg_slab_free_hook(s, page, head); redo: /* * Determine the currently cpus per cpu slab. @@ -3159,9 +3162,10 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, { struct kmem_cache_cpu *c; int i; + struct mem_cgroup *memcg = NULL; /* memcg and kmem_cache debug support */ - s = slab_pre_alloc_hook(s, flags); + s = slab_pre_alloc_hook(s, &memcg, size, flags); if (unlikely(!s)) return false; /* @@ -3206,11 +3210,11 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, } /* memcg and kmem_cache debug support */ - slab_post_alloc_hook(s, flags, size, p); + slab_post_alloc_hook(s, memcg, flags, size, p); return i; error: local_irq_enable(); - slab_post_alloc_hook(s, flags, i, p); + slab_post_alloc_hook(s, memcg, flags, i, p); __kmem_cache_free_bulk(s, i, p); return 0; } From patchwork Fri Oct 18 00:28:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197381 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9E06215AB for ; Fri, 18 Oct 2019 00:28:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 61D6321852 for ; Fri, 18 Oct 2019 00:28:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="TCsA+F1Y" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 61D6321852 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5FFD28E0007; Thu, 17 Oct 2019 20:28:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 588318E000B; Thu, 17 Oct 2019 20:28:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2512F8E0009; Thu, 17 Oct 2019 20:28:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0158.hostedemail.com [216.40.44.158]) by kanga.kvack.org (Postfix) with ESMTP id D3F578E0007 for ; Thu, 17 Oct 2019 20:28:38 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 294AA180ACF75 for ; Fri, 18 Oct 2019 00:28:38 +0000 (UTC) X-FDA: 76055019516.08.crown70_4b53a93145b5f X-Spam-Summary: 2,0,0,c560a62a810179c0,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1261:1277:1313:1314:1345:1359:1437:1516:1518:1534:1541:1711:1730:1747:1777:1792:2198:2199:2393:2559:2562:2693:2731:3138:3139:3140:3141:3142:3352:3865:3866:3867:3868:3871:3872:4605:5007:6261:6653:9592:10004:10400:11026:11658:11914:12043:12296:12297:12438:12555:12895:13069:13161:13229:13311:13357:14096:14097:14181:14394:14721:14819:14877:21080:21092:21451:21627:21740:30012:30054:30064,0,RBL:67.231.153.30:@fb.com:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:27,LUA_SUMMARY:none X-HE-Tag: crown70_4b53a93145b5f X-Filterd-Recvd-Size: 4484 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 00:28:37 +0000 (UTC) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0OM2w013489 for ; Thu, 17 Oct 2019 17:28:37 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=Qs1RRBJ36X/lNHlDnmRHWqTueV2yUJoMy2nQbVSGZMo=; b=TCsA+F1YHYLvNAxWQvYDf1t1dqysl+sA/9t5wF7RXCv2fv/kBt4u6CQonvWTFUaVl9+y ABml43luRYZYUeLWTYYj3QrzGeNYg7QTHBwUiJxKVhWEUGMxLdy5G2p01EMgWgjzf79F MCTgT+sa6lrXQOGbBGI+0FCHlKmtSDaxGnA= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2vpufxt2ku-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2019 17:28:37 -0700 Received: from 2401:db00:30:600c:face:0:1f:0 (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 17 Oct 2019 17:28:35 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id 01E5418CE8481; Thu, 17 Oct 2019 17:28:34 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 07/16] mm: memcg: move memcg_kmem_bypass() to memcontrol.h Date: Thu, 17 Oct 2019 17:28:11 -0700 Message-ID: <20191018002820.307763-8-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxscore=0 phishscore=0 lowpriorityscore=0 suspectscore=1 clxscore=1015 bulkscore=0 spamscore=0 adultscore=0 mlxlogscore=978 impostorscore=0 malwarescore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180001 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To make the memcg_kmem_bypass() function available outside of the memcontrol.c, let's move it to memcontrol.h. The function is small and nicely fits into static inline sort of functions. It will be used from the slab code. Signed-off-by: Roman Gushchin --- include/linux/memcontrol.h | 7 +++++++ mm/memcontrol.c | 7 ------- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index f4cb844005a5..7a6713dc52ce 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1432,6 +1432,13 @@ static inline bool memcg_kmem_enabled(void) return static_branch_unlikely(&memcg_kmem_enabled_key); } +static inline bool memcg_kmem_bypass(void) +{ + if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD)) + return true; + return false; +} + static inline int memcg_kmem_charge(struct page *page, gfp_t gfp, int order) { if (memcg_kmem_enabled()) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 47a30db94869..d9d5f77a783d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3000,13 +3000,6 @@ static void memcg_schedule_kmem_cache_create(struct mem_cgroup *memcg, queue_work(memcg_kmem_cache_wq, &cw->work); } -static inline bool memcg_kmem_bypass(void) -{ - if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD)) - return true; - return false; -} - /** * memcg_kmem_get_cache: select the correct per-memcg cache for allocation * @cachep: the original global kmem cache From patchwork Fri Oct 18 00:28:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197393 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ED3E515AB for ; Fri, 18 Oct 2019 00:29:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id ADB17222C9 for ; Fri, 18 Oct 2019 00:29:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="ehEiL+RE" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ADB17222C9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 373118E0009; Thu, 17 Oct 2019 20:28:41 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0F4F98E000D; Thu, 17 Oct 2019 20:28:41 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E60FD8E0009; Thu, 17 Oct 2019 20:28:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0202.hostedemail.com [216.40.44.202]) by kanga.kvack.org (Postfix) with ESMTP id BE2C58E000C for ; Thu, 17 Oct 2019 20:28:40 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 623BB180ACF75 for ; Fri, 18 Oct 2019 00:28:40 +0000 (UTC) X-FDA: 76055019600.21.sun15_4bace57d43309 X-Spam-Summary: 2,0,0,a3bdee065d8bd38a,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com,RULES_HIT:41:69:355:379:541:800:960:966:973:988:989:1260:1261:1277:1313:1314:1345:1359:1437:1516:1518:1535:1544:1605:1711:1730:1747:1777:1792:2196:2199:2393:2559:2562:2693:2895:3138:3139:3140:3141:3142:3865:3866:3867:3868:3871:3872:3874:4118:4250:4321:4385:4605:5007:6261:6653:7875:7903:9592:10004:11026:11473:11658:11914:12043:12114:12291:12296:12297:12438:12555:12683:12895:13227:13229:14096:14097:14181:14394:14721:21080:21450:21627:21966:30045:30054:30064:30070,0,RBL:67.231.153.30:@fb.com:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: sun15_4bace57d43309 X-Filterd-Recvd-Size: 7653 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 00:28:39 +0000 (UTC) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0NcwG008470 for ; Thu, 17 Oct 2019 17:28:39 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=s80/47kE6++tjdDNDZ71sdnneLbV+d8/4baGT5d2jWI=; b=ehEiL+REFcXkCl2jWCZJFBSXuMYLEyM+eas/WM3mqQKdRJZWL25klYW7SKcCrJ+ir401 4b3tJ5z55pf7S6abaWORhsVi7EKlGS42C+F1BHdPUWFz0b+78yNQQ0/mjxtPlizmU+sm 5JbcfjwOV0OrPQxZp8788aZu9uHmoOev9iI= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2vpw9r9fae-9 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2019 17:28:39 -0700 Received: from 2401:db00:30:600c:face:0:39:0 (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 17 Oct 2019 17:28:35 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id 05BC618CE8483; Thu, 17 Oct 2019 17:28:34 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 08/16] mm: memcg: introduce __mod_lruvec_memcg_state() Date: Thu, 17 Oct 2019 17:28:12 -0700 Message-ID: <20191018002820.307763-9-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxlogscore=758 lowpriorityscore=0 mlxscore=0 suspectscore=3 phishscore=0 clxscore=1015 adultscore=0 spamscore=0 bulkscore=0 priorityscore=1501 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180001 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To prepare for per-object accounting of slab objects, let's introduce __mod_lruvec_memcg_state() and mod_lruvec_memcg_state() helpers, which are similar to mod_lruvec_state(), but do not update global node counters, only lruvec and per-cgroup. It's necessary because soon node slab counters will be used for accounting of all memory used by slab pages, however on memcg level only the actually used memory will be counted. Free space will be shared between all cgroups, so it can't be accounted to any. Signed-off-by: Roman Gushchin --- include/linux/memcontrol.h | 22 ++++++++++++++++++++++ mm/memcontrol.c | 37 +++++++++++++++++++++++++++---------- 2 files changed, 49 insertions(+), 10 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 7a6713dc52ce..bf1fcf85abd8 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -738,6 +738,8 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val); +void __mod_lruvec_memcg_state(struct lruvec *lruvec, enum node_stat_item idx, + int val); void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val); static inline void mod_lruvec_state(struct lruvec *lruvec, @@ -750,6 +752,16 @@ static inline void mod_lruvec_state(struct lruvec *lruvec, local_irq_restore(flags); } +static inline void mod_lruvec_memcg_state(struct lruvec *lruvec, + enum node_stat_item idx, int val) +{ + unsigned long flags; + + local_irq_save(flags); + __mod_lruvec_memcg_state(lruvec, idx, val); + local_irq_restore(flags); +} + static inline void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx, int val) { @@ -1142,6 +1154,16 @@ static inline void mod_lruvec_state(struct lruvec *lruvec, mod_node_page_state(lruvec_pgdat(lruvec), idx, val); } +static inline void __mod_lruvec_memcg_state(struct lruvec *lruvec, + enum node_stat_item idx, int val) +{ +} + +static inline void mod_lruvec_memcg_state(struct lruvec *lruvec, + enum node_stat_item idx, int val) +{ +} + static inline void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx, int val) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d9d5f77a783d..ebaa4ff36e0c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -787,16 +787,16 @@ parent_nodeinfo(struct mem_cgroup_per_node *pn, int nid) } /** - * __mod_lruvec_state - update lruvec memory statistics + * __mod_lruvec_memcg_state - update lruvec memory statistics * @lruvec: the lruvec * @idx: the stat item * @val: delta to add to the counter, can be negative * * The lruvec is the intersection of the NUMA node and a cgroup. This - * function updates the all three counters that are affected by a - * change of state at this level: per-node, per-cgroup, per-lruvec. + * function updates the two of three counters that are affected by a + * change of state at this level: per-cgroup and per-lruvec. */ -void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, +void __mod_lruvec_memcg_state(struct lruvec *lruvec, enum node_stat_item idx, int val) { pg_data_t *pgdat = lruvec_pgdat(lruvec); @@ -804,12 +804,6 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, struct mem_cgroup *memcg; long x, threshold = MEMCG_CHARGE_BATCH; - /* Update node */ - __mod_node_page_state(pgdat, idx, val); - - if (mem_cgroup_disabled()) - return; - pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); memcg = pn->memcg; @@ -833,6 +827,29 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, __this_cpu_write(pn->lruvec_stat_cpu->count[idx], x); } +/** + * __mod_lruvec_state - update lruvec memory statistics + * @lruvec: the lruvec + * @idx: the stat item + * @val: delta to add to the counter, can be negative + * + * The lruvec is the intersection of the NUMA node and a cgroup. This + * function updates the all three counters that are affected by a + * change of state at this level: per-node, per-cgroup, per-lruvec. + */ +void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, + int val) +{ + pg_data_t *pgdat = lruvec_pgdat(lruvec); + + /* Update node */ + __mod_node_page_state(pgdat, idx, val); + + /* Update per-cgroup and per-lruvec stats */ + if (!mem_cgroup_disabled()) + __mod_lruvec_memcg_state(lruvec, idx, val); +} + void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val) { struct page *page = virt_to_head_page(p); From patchwork Fri Oct 18 00:28:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197377 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A07C513B1 for ; Fri, 18 Oct 2019 00:28:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4F58821852 for ; Fri, 18 Oct 2019 00:28:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="orC2ytYI" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4F58821852 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 16A8D8E0008; Thu, 17 Oct 2019 20:28:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0F3598E0003; Thu, 17 Oct 2019 20:28:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB2D78E0009; Thu, 17 Oct 2019 20:28:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0089.hostedemail.com [216.40.44.89]) by kanga.kvack.org (Postfix) with ESMTP id AA6B88E0005 for ; Thu, 17 Oct 2019 20:28:38 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 446B4801F619 for ; Fri, 18 Oct 2019 00:28:38 +0000 (UTC) X-FDA: 76055019516.16.tub20_4b5ba4e5c5e59 X-Spam-Summary: 2,0,0,541119d53b72d6c5,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com,RULES_HIT:1:2:41:69:355:379:541:800:960:966:973:988:989:1260:1261:1277:1313:1314:1345:1359:1437:1516:1518:1605:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2559:2562:2731:2892:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:4051:4250:4385:4605:5007:6261:6653:8603:9592:10004:11026:11658:11914:12043:12291:12296:12297:12438:12555:12683:12895:13191:13192:13229:13869:14096:14097:14394:21080:21433:21450:21627:21939:21972:30005:30054:30064:30070:30075,0,RBL:67.231.145.42:@fb.com:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: tub20_4b5ba4e5c5e59 X-Filterd-Recvd-Size: 11146 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 00:28:37 +0000 (UTC) Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0PT9N000916 for ; Thu, 17 Oct 2019 17:28:36 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=+4EuFXoz6PBFGXJnX24Fea7l6snTrRg2U/5Bduqey+g=; b=orC2ytYIwK9sorMKj9DfyA0mx4RPENst/KCn3UKBhtYec03epuAO51qtzejQq1SDEm2g CVjRP9Wzwt+OZBawy8+PhCEAYy9Fgg4uLm9S5S6zZsCa4XxX9W7QwWUTqoYr25YmRMom X8bhFfqVmjZQ+U2Zfip6HDEQjay0zn7KWVs= Received: from mail.thefacebook.com (mailout.thefacebook.com [199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2vpe3w5d8e-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Thu, 17 Oct 2019 17:28:36 -0700 Received: from 2401:db00:30:6012:face:0:17:0 (2620:10d:c081:10::13) by mail.thefacebook.com (2620:10d:c081:35::125) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.1.1713.5; Thu, 17 Oct 2019 17:28:35 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id 099DE18CE8485; Thu, 17 Oct 2019 17:28:34 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 09/16] mm: memcg/slab: charge individual slab objects instead of pages Date: Thu, 17 Oct 2019 17:28:13 -0700 Message-ID: <20191018002820.307763-10-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 suspectscore=3 phishscore=0 adultscore=0 priorityscore=1501 spamscore=0 clxscore=1015 malwarescore=0 lowpriorityscore=0 bulkscore=0 impostorscore=0 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180001 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Switch to per-object accounting of non-root slab objects. Charging is performed using subpage charging API in pre_alloc hook. If the amount of memory has been charged successfully, we proceed with the actual allocation. Otherwise, -ENOMEM is returned. In post_alloc hook we do check if the actual allocation succeeded. If so, corresponding vmstats are bumped and memcg membership information is recorded. Otherwise, the charge is canceled. On free path we do look for memcg membership information, decrement stats and do uncharge. No operations are performed with root kmem_caches. Global per-node slab-related vmstats NR_SLAB_(UN)RECLAIMABLE_B are still modified from (un)charge_slab_page() functions. The idea is to keep all slab pages accounted as slab pages on system level. Memcg and lruvec counters are now representing only memory used by actual slab objects and do not include free space. Free space is shared and doesn't belong to any specific cgroup. Signed-off-by: Roman Gushchin Signed-off-by: Johannes Weiner --- mm/slab.h | 152 ++++++++++++++++++++---------------------------------- 1 file changed, 57 insertions(+), 95 deletions(-) diff --git a/mm/slab.h b/mm/slab.h index 28feabed1e9a..0f2f712de77a 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -340,72 +340,6 @@ static inline struct mem_cgroup *memcg_from_slab_page(struct page *page) return NULL; } -/* - * Charge the slab page belonging to the non-root kmem_cache. - * Can be called for non-root kmem_caches only. - */ -static __always_inline int memcg_charge_slab(struct page *page, - gfp_t gfp, int order, - struct kmem_cache *s) -{ - struct mem_cgroup *memcg; - struct lruvec *lruvec; - int ret; - - rcu_read_lock(); - memcg = READ_ONCE(s->memcg_params.memcg); - while (memcg && !css_tryget_online(&memcg->css)) - memcg = parent_mem_cgroup(memcg); - rcu_read_unlock(); - - if (unlikely(!memcg || mem_cgroup_is_root(memcg))) { - mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - (PAGE_SIZE << order)); - percpu_ref_get_many(&s->memcg_params.refcnt, 1 << order); - return 0; - } - - ret = memcg_kmem_charge_memcg(page, gfp, order, memcg); - if (ret) - goto out; - - lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg); - mod_lruvec_state(lruvec, cache_vmstat_idx(s), PAGE_SIZE << order); - - /* transer try_charge() page references to kmem_cache */ - percpu_ref_get_many(&s->memcg_params.refcnt, 1 << order); - css_put_many(&memcg->css, 1 << order); -out: - css_put(&memcg->css); - return ret; -} - -/* - * Uncharge a slab page belonging to a non-root kmem_cache. - * Can be called for non-root kmem_caches only. - */ -static __always_inline void memcg_uncharge_slab(struct page *page, int order, - struct kmem_cache *s) -{ - struct mem_cgroup *memcg; - struct lruvec *lruvec; - - rcu_read_lock(); - memcg = READ_ONCE(s->memcg_params.memcg); - if (likely(!mem_cgroup_is_root(memcg))) { - lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg); - mod_lruvec_state(lruvec, cache_vmstat_idx(s), - -(PAGE_SIZE << order)); - memcg_kmem_uncharge_memcg(page, order, memcg); - } else { - mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - -(PAGE_SIZE << order)); - } - rcu_read_unlock(); - - percpu_ref_put_many(&s->memcg_params.refcnt, 1 << order); -} - static inline int memcg_alloc_page_memcg_vec(struct page *page, gfp_t gfp, unsigned int objects) { @@ -423,11 +357,31 @@ static inline void memcg_free_page_memcg_vec(struct page *page) page->mem_cgroup_vec = NULL; } +static inline struct kmem_cache *memcg_slab_pre_alloc_hook(struct kmem_cache *s, + struct mem_cgroup **memcgp, + size_t size, gfp_t flags) +{ + struct kmem_cache *cachep; + + cachep = memcg_kmem_get_cache(s, memcgp); + if (is_root_cache(cachep)) + return s; + + if (__memcg_kmem_charge_subpage(*memcgp, size * s->size, flags)) { + mem_cgroup_put(*memcgp); + memcg_kmem_put_cache(cachep); + cachep = NULL; + } + + return cachep; +} + static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct mem_cgroup *memcg, size_t size, void **p) { struct mem_cgroup_ptr *memcg_ptr; + struct lruvec *lruvec; struct page *page; unsigned long off; size_t i; @@ -439,6 +393,11 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, off = obj_to_index(s, page, p[i]); mem_cgroup_ptr_get(memcg_ptr); page->mem_cgroup_vec[off] = memcg_ptr; + lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg); + mod_lruvec_memcg_state(lruvec, cache_vmstat_idx(s), + s->size); + } else { + __memcg_kmem_uncharge_subpage(memcg, s->size); } } mem_cgroup_ptr_put(memcg_ptr); @@ -451,6 +410,8 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page, void *p) { struct mem_cgroup_ptr *memcg_ptr; + struct mem_cgroup *memcg; + struct lruvec *lruvec; unsigned int off; if (!memcg_kmem_enabled() || is_root_cache(s)) @@ -459,6 +420,14 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page, off = obj_to_index(s, page, p); memcg_ptr = page->mem_cgroup_vec[off]; page->mem_cgroup_vec[off] = NULL; + rcu_read_lock(); + memcg = memcg_ptr->memcg; + if (likely(!mem_cgroup_is_root(memcg))) { + __memcg_kmem_uncharge_subpage(memcg, s->size); + lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg); + mod_lruvec_memcg_state(lruvec, cache_vmstat_idx(s), -s->size); + } + rcu_read_unlock(); mem_cgroup_ptr_put(memcg_ptr); } @@ -500,17 +469,6 @@ static inline struct mem_cgroup *memcg_from_slab_page(struct page *page) return NULL; } -static inline int memcg_charge_slab(struct page *page, gfp_t gfp, int order, - struct kmem_cache *s) -{ - return 0; -} - -static inline void memcg_uncharge_slab(struct page *page, int order, - struct kmem_cache *s) -{ -} - static inline int memcg_alloc_page_memcg_vec(struct page *page, gfp_t gfp, unsigned int objects) { @@ -521,6 +479,13 @@ static inline void memcg_free_page_memcg_vec(struct page *page) { } +static inline struct kmem_cache *memcg_slab_pre_alloc_hook(struct kmem_cache *s, + struct mem_cgroup **memcgp, + size_t size, gfp_t flags) +{ + return NULL; +} + static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct mem_cgroup *memcg, size_t size, void **p) @@ -561,30 +526,27 @@ static __always_inline int charge_slab_page(struct page *page, { int ret; - if (is_root_cache(s)) { - mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - PAGE_SIZE << order); - return 0; - } - - ret = memcg_alloc_page_memcg_vec(page, gfp, objects); - if (ret) - return ret; + if (!is_root_cache(s)) { + ret = memcg_alloc_page_memcg_vec(page, gfp, objects); + if (ret) + return ret; - return memcg_charge_slab(page, gfp, order, s); + percpu_ref_get_many(&s->memcg_params.refcnt, 1 << order); + } + mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), + PAGE_SIZE << order); + return 0; } static __always_inline void uncharge_slab_page(struct page *page, int order, struct kmem_cache *s) { - if (is_root_cache(s)) { - mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - -(PAGE_SIZE << order)); - return; + if (!is_root_cache(s)) { + memcg_free_page_memcg_vec(page); + percpu_ref_put_many(&s->memcg_params.refcnt, 1 << order); } - - memcg_free_page_memcg_vec(page); - memcg_uncharge_slab(page, order, s); + mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), + -(PAGE_SIZE << order)); } static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x) @@ -656,7 +618,7 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, if (memcg_kmem_enabled() && ((flags & __GFP_ACCOUNT) || (s->flags & SLAB_ACCOUNT))) - return memcg_kmem_get_cache(s, memcgp); + return memcg_slab_pre_alloc_hook(s, memcgp, size, flags); return s; } From patchwork Fri Oct 18 00:28:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197391 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 49D1513B1 for ; Fri, 18 Oct 2019 00:28:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0D76921852 for ; Fri, 18 Oct 2019 00:28:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="jC/28qs+" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D76921852 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0F6918E000E; Thu, 17 Oct 2019 20:28:41 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F26438E000C; Thu, 17 Oct 2019 20:28:40 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D9CBB8E000D; Thu, 17 Oct 2019 20:28:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0235.hostedemail.com [216.40.44.235]) by kanga.kvack.org (Postfix) with ESMTP id AC7B18E0009 for ; Thu, 17 Oct 2019 20:28:40 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 499AC4427 for ; Fri, 18 Oct 2019 00:28:40 +0000 (UTC) X-FDA: 76055019600.17.roof93_4ba4b5302a63c X-Spam-Summary: 2,0,0,80fa1fd6359a288f,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com,RULES_HIT:41:69:355:379:541:800:960:973:988:989:1260:1261:1277:1313:1314:1345:1359:1437:1516:1518:1535:1542:1711:1730:1747:1777:1792:2194:2199:2393:2559:2562:3138:3139:3140:3141:3142:3353:3865:3866:3867:3868:3871:3872:4321:4605:5007:6119:6261:6653:7875:7903:9592:10004:10400:11026:11658:11914:12043:12114:12296:12297:12438:12555:12895:12986:14096:14097:14181:14394:14721:14819:21080:21092:21451:21627:30054:30064,0,RBL:67.231.145.42:@fb.com:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ff,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: roof93_4ba4b5302a63c X-Filterd-Recvd-Size: 5165 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 00:28:39 +0000 (UTC) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0PWeg023516 for ; Thu, 17 Oct 2019 17:28:38 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=60IWhRr3yus20Qr3RXUIWm4DFv2NBY+p0fIoG5TahZs=; b=jC/28qs+RLG9bX6pwPbX7DBL9H6RNFtHs/A4TF2VpB2254sqm0+RrIApA+FS031Bqi0G XhQZXKgjmnZuYpv/9s2TwaT/eZLhpd1NlxnWOScGOnhDgBV2fDjdRYBgJqUbTwOjSK+y Yysjovi72pPzcRtMebRi4TNac7OGy5ZafRo= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2vpcs15qdb-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2019 17:28:38 -0700 Received: from 2401:db00:12:9028:face:0:29:0 (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 17 Oct 2019 17:28:35 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id 0CE3618CE8487; Thu, 17 Oct 2019 17:28:34 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 10/16] mm: memcg: move get_mem_cgroup_from_current() to memcontrol.h Date: Thu, 17 Oct 2019 17:28:14 -0700 Message-ID: <20191018002820.307763-11-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 lowpriorityscore=0 clxscore=1015 impostorscore=0 priorityscore=1501 spamscore=0 mlxlogscore=999 mlxscore=0 phishscore=0 suspectscore=1 adultscore=0 malwarescore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180001 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Move get_mem_cgroup_from_current() to memcontrol.h to make it available for use outside of the memcontrol.c. The function is small and marked as __always_inline, so it's better to place it into memcontrol.h. Signed-off-by: Roman Gushchin --- include/linux/memcontrol.h | 17 +++++++++++++++++ mm/memcontrol.c | 17 ----------------- 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index bf1fcf85abd8..a75980559a47 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -453,6 +453,23 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm); struct mem_cgroup *get_mem_cgroup_from_page(struct page *page); +/** + * If current->active_memcg is non-NULL, do not fallback to current->mm->memcg. + */ +static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void) +{ + if (unlikely(current->active_memcg)) { + struct mem_cgroup *memcg = root_mem_cgroup; + + rcu_read_lock(); + if (css_tryget_online(¤t->active_memcg->css)) + memcg = current->active_memcg; + rcu_read_unlock(); + return memcg; + } + return get_mem_cgroup_from_mm(current->mm); +} + static inline struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ return css ? container_of(css, struct mem_cgroup, css) : NULL; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ebaa4ff36e0c..8a753a336efd 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1074,23 +1074,6 @@ struct mem_cgroup *get_mem_cgroup_from_page(struct page *page) } EXPORT_SYMBOL(get_mem_cgroup_from_page); -/** - * If current->active_memcg is non-NULL, do not fallback to current->mm->memcg. - */ -static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void) -{ - if (unlikely(current->active_memcg)) { - struct mem_cgroup *memcg = root_mem_cgroup; - - rcu_read_lock(); - if (css_tryget_online(¤t->active_memcg->css)) - memcg = current->active_memcg; - rcu_read_unlock(); - return memcg; - } - return get_mem_cgroup_from_mm(current->mm); -} - /** * mem_cgroup_iter - iterate over memory cgroup hierarchy * @root: hierarchy root From patchwork Fri Oct 18 00:28:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197397 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4095313B1 for ; Fri, 18 Oct 2019 00:29:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EA6D521D7D for ; Fri, 18 Oct 2019 00:29:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="Vu2NAFxP" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EA6D521D7D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EFD348E000C; Thu, 17 Oct 2019 20:28:41 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E61128E000D; Thu, 17 Oct 2019 20:28:41 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CDC608E000C; Thu, 17 Oct 2019 20:28:41 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0122.hostedemail.com [216.40.44.122]) by kanga.kvack.org (Postfix) with ESMTP id 8A20B8E000D for ; Thu, 17 Oct 2019 20:28:41 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 2292C440E for ; Fri, 18 Oct 2019 00:28:41 +0000 (UTC) X-FDA: 76055019642.10.kite66_4bc6f2b46ec21 X-Spam-Summary: 2,0,0,18076b0250adfd7d,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com,RULES_HIT:41:69:355:379:541:800:960:968:973:988:989:1260:1261:1277:1313:1314:1345:1359:1437:1516:1518:1535:1544:1711:1730:1747:1777:1792:2194:2199:2393:2559:2562:2895:2903:3138:3139:3140:3141:3142:3355:3865:3866:3867:3868:3870:4118:4250:4321:4418:4605:5007:6261:6653:7903:8957:9592:10004:11026:11658:11914:12043:12114:12296:12297:12438:12555:12679:12895:12986:13180:13229:13255:14181:14394:14721:21080:21222:21450:21627,0,RBL:67.231.153.30:@fb.com:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: kite66_4bc6f2b46ec21 X-Filterd-Recvd-Size: 7030 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 00:28:40 +0000 (UTC) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0Nd6s008603 for ; Thu, 17 Oct 2019 17:28:40 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=6szowz6ychF/Fx/Qii6u2VKsIWGHdQvAp2da21KVkNA=; b=Vu2NAFxPIbj7oF0+HnIkt2a0j8unjSH3WnCIr36L7hflUhcuAHPKxk7YvaDNERFHvolM jJKP6zq9H0BP2tATGSvasiSeHh5bi2CQFXlOuqkyhZ1ovslCkexIWZJBLrBRSzw189X3 V8nz4XWQbcx/ox0vNwKg4SutROxUuZpKvYk= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2vpw9r9fab-13 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2019 17:28:40 -0700 Received: from 2401:db00:30:600c:face:0:39:0 (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 17 Oct 2019 17:28:35 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id 109B318CE8489; Thu, 17 Oct 2019 17:28:34 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 11/16] mm: memcg/slab: replace memcg_from_slab_page() with memcg_from_slab_obj() Date: Thu, 17 Oct 2019 17:28:15 -0700 Message-ID: <20191018002820.307763-12-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxlogscore=831 lowpriorityscore=0 mlxscore=0 suspectscore=1 phishscore=0 clxscore=1015 adultscore=0 spamscore=0 bulkscore=0 priorityscore=1501 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180001 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On our way to share slab pages between multiple memory cgroups let's make sure we don't use kmem_cache.memcg_params.memcg pointer to determine memcg ownership of a slab object/page. Let's transform memcg_from_slab_page() into memcg_from_slab_obj(), which relies on memcg ownership data stored in page->mem_cgroup_vec. Delete mem_cgroup_from_kmem() and use memcg_from_slab_obj() instead. Note: memcg_from_slab_obj() returns NULL if slab obj belongs to the root cgroup, so remove the redundant check in __mod_lruvec_slab_state(). Signed-off-by: Roman Gushchin --- mm/list_lru.c | 12 +----------- mm/memcontrol.c | 9 ++++----- mm/slab.h | 21 +++++++++++++-------- 3 files changed, 18 insertions(+), 24 deletions(-) diff --git a/mm/list_lru.c b/mm/list_lru.c index 0f1f6b06b7f3..4f9d791b802c 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -57,16 +57,6 @@ list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx) return &nlru->lru; } -static __always_inline struct mem_cgroup *mem_cgroup_from_kmem(void *ptr) -{ - struct page *page; - - if (!memcg_kmem_enabled()) - return NULL; - page = virt_to_head_page(ptr); - return memcg_from_slab_page(page); -} - static inline struct list_lru_one * list_lru_from_kmem(struct list_lru_node *nlru, void *ptr, struct mem_cgroup **memcg_ptr) @@ -77,7 +67,7 @@ list_lru_from_kmem(struct list_lru_node *nlru, void *ptr, if (!nlru->memcg_lrus) goto out; - memcg = mem_cgroup_from_kmem(ptr); + memcg = memcg_from_slab_obj(ptr); if (!memcg) goto out; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8a753a336efd..1982b14d6e6f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -548,7 +548,7 @@ ino_t page_cgroup_ino(struct page *page) rcu_read_lock(); if (PageHead(page) && PageSlab(page)) - memcg = memcg_from_slab_page(page); + memcg = root_mem_cgroup; else memcg = READ_ONCE(page->mem_cgroup); while (memcg && !(memcg->css.flags & CSS_ONLINE)) @@ -852,16 +852,15 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val) { - struct page *page = virt_to_head_page(p); - pg_data_t *pgdat = page_pgdat(page); + pg_data_t *pgdat = page_pgdat(virt_to_page(p)); struct mem_cgroup *memcg; struct lruvec *lruvec; rcu_read_lock(); - memcg = memcg_from_slab_page(page); + memcg = memcg_from_slab_obj(p); /* Untracked pages have no memcg, no lruvec. Update only the node */ - if (!memcg || memcg == root_mem_cgroup) { + if (!memcg) { __mod_node_page_state(pgdat, idx, val); } else { lruvec = mem_cgroup_lruvec(pgdat, memcg); diff --git a/mm/slab.h b/mm/slab.h index 0f2f712de77a..a6330065d434 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -329,15 +329,20 @@ static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s) * The kmem_cache can be reparented asynchronously. The caller must ensure * the memcg lifetime, e.g. by taking rcu_read_lock() or cgroup_mutex. */ -static inline struct mem_cgroup *memcg_from_slab_page(struct page *page) +static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr) { - struct kmem_cache *s; - - s = READ_ONCE(page->slab_cache); - if (s && !is_root_cache(s)) - return READ_ONCE(s->memcg_params.memcg); + struct mem_cgroup_ptr *memcg_ptr; + struct page *page; + unsigned int off; - return NULL; + if (!memcg_kmem_enabled()) + return NULL; + page = virt_to_head_page(ptr); + if (is_root_cache(page->slab_cache)) + return NULL; + off = obj_to_index(page->slab_cache, page, ptr); + memcg_ptr = page->mem_cgroup_vec[off]; + return memcg_ptr->memcg; } static inline int memcg_alloc_page_memcg_vec(struct page *page, gfp_t gfp, @@ -464,7 +469,7 @@ static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s) return s; } -static inline struct mem_cgroup *memcg_from_slab_page(struct page *page) +static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr) { return NULL; } From patchwork Fri Oct 18 00:28:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197407 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB12E14ED for ; Fri, 18 Oct 2019 00:34:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 92E8C21852 for ; Fri, 18 Oct 2019 00:34:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="PQ0WVS6Z" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 92E8C21852 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B9DD88E0014; Thu, 17 Oct 2019 20:34:27 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B4EBC8E0010; Thu, 17 Oct 2019 20:34:27 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A3CBC8E0014; Thu, 17 Oct 2019 20:34:27 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0111.hostedemail.com [216.40.44.111]) by kanga.kvack.org (Postfix) with ESMTP id 835348E0010 for ; Thu, 17 Oct 2019 20:34:27 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 063A6180ACF7F for ; Fri, 18 Oct 2019 00:34:27 +0000 (UTC) X-FDA: 76055034174.17.bread62_7e196c0c1e44a X-Spam-Summary: 2,0,0,08115315b33a11d3,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com:tobin@kernel.org:tj@kernel.org,RULES_HIT:1:2:41:355:379:541:800:960:966:967:973:982:988:989:1260:1261:1277:1313:1314:1345:1359:1431:1437:1516:1518:1605:1730:1747:1777:1792:1801:2194:2196:2198:2199:2200:2201:2393:2525:2538:2559:2563:2610:2682:2685:2859:2892:2900:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4051:4250:4321:4385:4605:4886:5007:6117:6119:6261:6653:6742:7875:7903:7904:8603:8957:9025:9036:9121:10004:11026:11233:11658:11914:12043:12291:12296:12297:12438:12555:12683:12895:12986:13184:13229:13851:14394:21080:21324:21433:21451:21627:21795:21939:21972:30029:30051:30054:30064:30066:30067:30075:30080,0,RBL:error ,CacheIP X-HE-Tag: bread62_7e196c0c1e44a X-Filterd-Recvd-Size: 11689 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 00:34:26 +0000 (UTC) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0YN7D021752 for ; Thu, 17 Oct 2019 17:34:25 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=9fnViyF3WmBNOlUXHhJAU7+X5ni+XxiLcIBx7NMNAh4=; b=PQ0WVS6ZBIc2cFFFAc9RTXSMJSumkr3qBOsPadYn6GVmn89ZyK/AKuBTz8dBaiaa6bHc AbQc0THHicNfrdvLLxIlgKbQXhdpAEviVO/QjXEubai7EgX4WtslW4yeZa+teSMtyaf1 N+pHeYhVLdO34DpA9tAr8M6B6etVYVUNOnU= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2vpufxt35j-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2019 17:34:25 -0700 Received: from 2401:db00:30:6012:face:0:17:0 (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::d) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 17 Oct 2019 17:33:56 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id 14B4318CE848B; Thu, 17 Oct 2019 17:28:34 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin , "Tobin C . Harding" , Tejun Heo Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 12/16] tools/cgroup: add slabinfo.py tool Date: Thu, 17 Oct 2019 17:28:16 -0700 Message-ID: <20191018002820.307763-13-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxscore=0 phishscore=0 lowpriorityscore=0 suspectscore=0 clxscore=1015 bulkscore=0 spamscore=0 adultscore=0 mlxlogscore=999 impostorscore=0 malwarescore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180002 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a drgn-based tool to display slab information for a given memcg. Can replace cgroup v1 memory.kmem.slabinfo interface on cgroup v2, but in a more flexiable way. Currently supports only SLUB configuration, but SLAB can be trivially added later. Output example: $ sudo ./tools/cgroup/slabinfo.py /sys/fs/cgroup/user.slice/user-111017.slice/user\@111017.service shmem_inode_cache 92 92 704 46 8 : tunables 0 0 0 : slabdata 2 2 0 eventpoll_pwq 56 56 72 56 1 : tunables 0 0 0 : slabdata 1 1 0 eventpoll_epi 32 32 128 32 1 : tunables 0 0 0 : slabdata 1 1 0 kmalloc-8 0 0 8 512 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-96 0 0 96 42 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-2048 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-64 128 128 64 64 1 : tunables 0 0 0 : slabdata 2 2 0 mm_struct 160 160 1024 32 8 : tunables 0 0 0 : slabdata 5 5 0 signal_cache 96 96 1024 32 8 : tunables 0 0 0 : slabdata 3 3 0 sighand_cache 45 45 2112 15 8 : tunables 0 0 0 : slabdata 3 3 0 files_cache 138 138 704 46 8 : tunables 0 0 0 : slabdata 3 3 0 task_delay_info 153 153 80 51 1 : tunables 0 0 0 : slabdata 3 3 0 task_struct 27 27 3520 9 8 : tunables 0 0 0 : slabdata 3 3 0 radix_tree_node 56 56 584 28 4 : tunables 0 0 0 : slabdata 2 2 0 btrfs_inode 140 140 1136 28 8 : tunables 0 0 0 : slabdata 5 5 0 kmalloc-1024 64 64 1024 32 8 : tunables 0 0 0 : slabdata 2 2 0 kmalloc-192 84 84 192 42 2 : tunables 0 0 0 : slabdata 2 2 0 inode_cache 54 54 600 27 4 : tunables 0 0 0 : slabdata 2 2 0 kmalloc-128 0 0 128 32 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-512 32 32 512 32 4 : tunables 0 0 0 : slabdata 1 1 0 skbuff_head_cache 32 32 256 32 2 : tunables 0 0 0 : slabdata 1 1 0 sock_inode_cache 46 46 704 46 8 : tunables 0 0 0 : slabdata 1 1 0 cred_jar 378 378 192 42 2 : tunables 0 0 0 : slabdata 9 9 0 proc_inode_cache 96 96 672 24 4 : tunables 0 0 0 : slabdata 4 4 0 dentry 336 336 192 42 2 : tunables 0 0 0 : slabdata 8 8 0 filp 697 864 256 32 2 : tunables 0 0 0 : slabdata 27 27 0 anon_vma 644 644 88 46 1 : tunables 0 0 0 : slabdata 14 14 0 pid 1408 1408 64 64 1 : tunables 0 0 0 : slabdata 22 22 0 vm_area_struct 1200 1200 200 40 2 : tunables 0 0 0 : slabdata 30 30 0 Signed-off-by: Roman Gushchin Cc: Waiman Long Cc: Tobin C. Harding Cc: Tejun Heo --- tools/cgroup/slabinfo.py | 159 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 159 insertions(+) create mode 100755 tools/cgroup/slabinfo.py diff --git a/tools/cgroup/slabinfo.py b/tools/cgroup/slabinfo.py new file mode 100755 index 000000000000..40b01a6ec4b0 --- /dev/null +++ b/tools/cgroup/slabinfo.py @@ -0,0 +1,159 @@ +#!/usr/bin/env drgn +# +# Copyright (C) 2019 Roman Gushchin +# Copyright (C) 2019 Facebook + +from os import stat +import argparse +import sys + +from drgn.helpers.linux import list_for_each_entry, list_empty +from drgn import container_of + + +DESC = """ +This is a drgn script to provide slab statistics for memory cgroups. +It supports cgroup v2 and v1 and can emulate memory.kmem.slabinfo +interface of cgroup v1. +For drgn, visit https://github.com/osandov/drgn. +""" + + +MEMCGS = {} + +OO_SHIFT = 16 +OO_MASK = ((1 << OO_SHIFT) - 1) + + +def err(s): + print('slabinfo.py: error: %s' % s, file=sys.stderr, flush=True) + sys.exit(1) + + +def find_memcg_ids(css=prog['root_mem_cgroup'].css, prefix=''): + if not list_empty(css.children.address_of_()): + for css in list_for_each_entry('struct cgroup_subsys_state', + css.children.address_of_(), + 'sibling'): + name = prefix + '/' + css.cgroup.kn.name.string_().decode('utf-8') + memcg = container_of(css, 'struct mem_cgroup', 'css') + MEMCGS[css.cgroup.kn.id.ino.value_()] = memcg + find_memcg_ids(css, name) + + +def is_root_cache(s): + return False if s.memcg_params.root_cache else True + + +def cache_name(s): + if is_root_cache(s): + return s.name + else: + return s.memcg_params.root_cache.name.string_().decode('utf-8') + + +# SLUB + +def oo_order(s): + return s.oo.x >> OO_SHIFT + + +def oo_objects(s): + return s.oo.x & OO_MASK + + +def count_partial(n, fn): + nr_pages = 0 + for page in list_for_each_entry('struct page', n.partial.address_of_(), + 'lru'): + nr_pages += fn(page) + return nr_pages + + +def count_free(page): + return page.objects - page.inuse + + +def slub_get_slabinfo(s, cfg): + nr_slabs = 0 + nr_objs = 0 + nr_free = 0 + + for node in range(cfg['nr_nodes']): + n = s.node[node] + nr_slabs += n.nr_slabs.counter.value_() + nr_objs += n.total_objects.counter.value_() + nr_free += count_partial(n, count_free) + + return {'active_objs': nr_objs - nr_free, + 'num_objs': nr_objs, + 'active_slabs': nr_slabs, + 'num_slabs': nr_slabs, + 'objects_per_slab': oo_objects(s), + 'cache_order': oo_order(s), + 'limit': 0, + 'batchcount': 0, + 'shared': 0, + 'shared_avail': 0} + +# SLAB-specific functions can be added here... + + +def cache_show(s, cfg): + if cfg['allocator'] == 'SLUB': + sinfo = slub_get_slabinfo(s, cfg) + else: + err('SLAB isn\'t supported yet') + + print('%-17s %6lu %6lu %6u %4u %4d' + ' : tunables %4u %4u %4u' + ' : slabdata %6lu %6lu %6lu' % ( + cache_name(s), sinfo['active_objs'], sinfo['num_objs'], + s.size, sinfo['objects_per_slab'], 1 << sinfo['cache_order'], + sinfo['limit'], sinfo['batchcount'], sinfo['shared'], + sinfo['active_slabs'], sinfo['num_slabs'], + sinfo['shared_avail'])) + + +def detect_kernel_config(): + cfg = {} + + cfg['nr_nodes'] = prog['nr_online_nodes'].value_() + + if prog.type('struct kmem_cache').members[1][1] == 'flags': + cfg['allocator'] = 'SLUB' + elif prog.type('struct kmem_cache').members[1][1] == 'batchcount': + cfg['allocator'] = 'SLAB' + else: + err('Can\'t determine the slab allocator') + + return cfg + + +def main(): + parser = argparse.ArgumentParser(description=DESC, + formatter_class=argparse.RawTextHelpFormatter) + parser.add_argument('cgroup', metavar='CGROUP', + help='Target memory cgroup') + args = parser.parse_args() + + try: + cgroup_id = stat(args.cgroup).st_ino + find_memcg_ids() + memcg = MEMCGS[cgroup_id] + except KeyError: + err('Can\'t find the memory cgroup') + + cfg = detect_kernel_config() + + print('# name ' + ' : tunables ' + ' : slabdata ') + + for s in list_for_each_entry('struct kmem_cache', + memcg.kmem_caches.address_of_(), + 'memcg_params.kmem_caches_node'): + cache_show(s, cfg) + + +main() From patchwork Fri Oct 18 00:28:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197403 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8926513B1 for ; Fri, 18 Oct 2019 00:29:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4C65821D7D for ; Fri, 18 Oct 2019 00:29:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="QdcuVdIp" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4C65821D7D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3DC2C8E0012; Thu, 17 Oct 2019 20:28:43 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 33BE08E0010; Thu, 17 Oct 2019 20:28:43 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B77B8E0012; Thu, 17 Oct 2019 20:28:43 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E7F628E0010 for ; Thu, 17 Oct 2019 20:28:42 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 8244F180220D4 for ; Fri, 18 Oct 2019 00:28:42 +0000 (UTC) X-FDA: 76055019684.19.sofa06_4bf9d82f5e20c X-Spam-Summary: 2,0,0,657f0ef4a0ed1bcd,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com,RULES_HIT:41:69:355:379:541:800:960:973:988:989:1260:1261:1277:1313:1314:1345:1359:1437:1500:1516:1518:1535:1542:1711:1730:1747:1777:1792:2393:2559:2562:2693:2892:3138:3139:3140:3141:3142:3354:3740:3865:3866:3867:3868:3870:3871:3872:4470:5007:6261:6653:7903:9592:10004:10400:10450:10455:11026:11473:11658:11914:12043:12114:12294:12296:12297:12438:12555:12683:12895:13161:13229:14096:14097:14181:14394:14721:19904:19999:21063:21080:21433:21627:21939:30029:30054:30062:30064:30070:30075,0,RBL:error,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: sofa06_4bf9d82f5e20c X-Filterd-Recvd-Size: 5431 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 00:28:41 +0000 (UTC) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0NP8F030781 for ; Thu, 17 Oct 2019 17:28:41 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=l5COkBsULgYeEY8XFkfrRu29TCsaC54oXG6Bx5D49TM=; b=QdcuVdIp3NjTnGh+BC7udYFT3gFLGP+rPC3U7tjNt/snACKLKxTe5h5aNum0HD3D4kSb 7W5KyqYfFPzJhdCikOxtgxUBbYqD66QaMTMDUXte4UxtqKn3RG63pouH6szB/arbNT4E j95izT5tUaMkJF4jDKxQhgXSg/KhdQQriUg= Received: from mail.thefacebook.com (mailout.thefacebook.com [199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2vpj8rvkyg-9 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Thu, 17 Oct 2019 17:28:41 -0700 Received: from 2401:db00:30:6012:face:0:17:0 (2620:10d:c081:10::13) by mail.thefacebook.com (2620:10d:c081:35::128) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.1.1713.5; Thu, 17 Oct 2019 17:28:36 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id 1828718CE848D; Thu, 17 Oct 2019 17:28:34 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 13/16] mm: memcg/slab: deprecate memory.kmem.slabinfo Date: Thu, 17 Oct 2019 17:28:17 -0700 Message-ID: <20191018002820.307763-14-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxlogscore=677 clxscore=1015 spamscore=0 impostorscore=0 priorityscore=1501 lowpriorityscore=0 suspectscore=1 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180001 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Deprecate memory.kmem.slabinfo. An empty file will be presented if corresponding config options are enabled. The interface is implementation dependent, isn't present in cgroup v2, and is generally useful only for core mm debugging purposes. In other words, it doesn't provide any value for the absolute majority of users. A drgn-based replacement can be found in tools/cgroup/slabinfo.py . It does support cgroup v1 and v2, mimics memory.kmem.slabinfo output and also allows to get any additional information without a need to recompile the kernel. If a drgn-based solution is too slow for a task, a bpf-based tracing tool can be used, which can easily keep track of all slab allocations belonging to a memory cgroup. Signed-off-by: Roman Gushchin --- mm/memcontrol.c | 3 --- mm/slab_common.c | 31 ++++--------------------------- 2 files changed, 4 insertions(+), 30 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1982b14d6e6f..0c9698f03cfe 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5038,9 +5038,6 @@ static struct cftype mem_cgroup_legacy_files[] = { #if defined(CONFIG_SLAB) || defined(CONFIG_SLUB_DEBUG) { .name = "kmem.slabinfo", - .seq_start = memcg_slab_start, - .seq_next = memcg_slab_next, - .seq_stop = memcg_slab_stop, .seq_show = memcg_slab_show, }, #endif diff --git a/mm/slab_common.c b/mm/slab_common.c index f0f7f955c5fa..cc0c70b57c1c 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1509,35 +1509,12 @@ void dump_unreclaimable_slab(void) } #if defined(CONFIG_MEMCG) -void *memcg_slab_start(struct seq_file *m, loff_t *pos) -{ - struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - - mutex_lock(&slab_mutex); - return seq_list_start(&memcg->kmem_caches, *pos); -} - -void *memcg_slab_next(struct seq_file *m, void *p, loff_t *pos) -{ - struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - - return seq_list_next(p, &memcg->kmem_caches, pos); -} - -void memcg_slab_stop(struct seq_file *m, void *p) -{ - mutex_unlock(&slab_mutex); -} - int memcg_slab_show(struct seq_file *m, void *p) { - struct kmem_cache *s = list_entry(p, struct kmem_cache, - memcg_params.kmem_caches_node); - struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - - if (p == memcg->kmem_caches.next) - print_slabinfo_header(m); - cache_show(s, m); + /* + * Deprecated. + * Please, take a look at tools/cgroup/slabinfo.py . + */ return 0; } #endif From patchwork Fri Oct 18 00:28:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197405 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8484D13B1 for ; Fri, 18 Oct 2019 00:29:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1DA4E21D7D for ; Fri, 18 Oct 2019 00:29:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="Iko7aXhj" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1DA4E21D7D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C18AD8E0013; Thu, 17 Oct 2019 20:28:48 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B7DEE8E0010; Thu, 17 Oct 2019 20:28:48 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D0778E0013; Thu, 17 Oct 2019 20:28:48 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0183.hostedemail.com [216.40.44.183]) by kanga.kvack.org (Postfix) with ESMTP id 5F3818E0010 for ; Thu, 17 Oct 2019 20:28:48 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id E8DD18020B36 for ; Fri, 18 Oct 2019 00:28:47 +0000 (UTC) X-FDA: 76055019894.25.stone51_4cb202072833d X-Spam-Summary: 2,0,0,9afe04c7ae7e2b72,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com,RULES_HIT:69:327:355:379:421:541:960:966:968:973:988:989:1260:1261:1277:1313:1314:1345:1359:1431:1437:1516:1518:1605:1730:1747:1777:1792:2106:2194:2196:2198:2199:2200:2201:2393:2538:2559:2562:2610:2689:2693:2731:2892:2895:2897:2898:2901:2904:2914:2918:2922:2923:2924:2925:2926:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4250:4321:4385:4605:5007:6261:6653:7875:7903:8603:8784:8957:9010:9108:9592:10004:10241:10394:11026:11232:11914:12043:12294:12295:12296:12297:12438:12555:12679:12683:12895:12986:13141:13230:13868:13972:14096:14097:14394:21063:21080:21222:21324:21433:21451:21627:21740:21789:21795:30012:30029:30034:30045:30046:30051:30054:30064:30070:30075:30080:30090,0,RBL:error,CacheIP:none,Bayesian:0.5,0 .5,0.5,N X-HE-Tag: stone51_4cb202072833d X-Filterd-Recvd-Size: 42673 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 00:28:46 +0000 (UTC) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0NP8R030781 for ; Thu, 17 Oct 2019 17:28:46 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=ryuCoRunr9dD/0DUyEYqCfwRBHwMfteNiovxH6SPta8=; b=Iko7aXhjB1Q7xMcVeaen+AKMA0UF/xpMaS4SycuB3QlGzO8dsxFmlki7iq4ic0HM8dF3 2/hi1oLEnSS0zMOMdpB/0ZE5ee10bLeB1NRBwQsCxvmxi0Z9OMUxY5ZtdZfz5KzgOTzj mDWsyLCbtrOCpUDK/HzEQicg639wEtimMlA= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2vpj8rvkye-18 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2019 17:28:45 -0700 Received: from 2401:db00:30:600c:face:0:1f:0 (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 17 Oct 2019 17:28:35 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id 1C45018CE848F; Thu, 17 Oct 2019 17:28:34 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 14/16] mm: memcg/slab: use one set of kmem_caches for all memory cgroups Date: Thu, 17 Oct 2019 17:28:18 -0700 Message-ID: <20191018002820.307763-15-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxlogscore=999 clxscore=1015 spamscore=0 impostorscore=0 priorityscore=1501 lowpriorityscore=0 suspectscore=4 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180001 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is fairly big but mostly red patch, which makes all non-root slab allocations using a single set of kmem_caches instead of creating a separate set for each memory cgroup. Because the number of non-root kmem_caches is now capped by the number of root kmem_caches, there is no need to shrink or destroy them prematurely. They can be perfectly destroyed together with their root counterparts. This allows to dramatically simplify the management of non-root kmem_caches and delete a ton of code. This patch performs the following changes: 1) introduces memcg_params.memcg_cache pointer to represent the kmem_cache which will be used for all non-root allocations 2) reuses the existing memcg kmem_cache creation mechanism to create memcg kmem_cache on the first allocation attempt 3) memcg kmem_caches are named -memcg, e.g. dentry-memcg 4) simplifies memcg_kmem_get_cache() to just return memcg kmem_cache or schedule it's creation and return the root cache 5) removes almost all non-root kmem_cache management code (separate refcounter, reparenting, shrinking, etc) 6) makes slab debugfs to display root_mem_cgroup css id and never show :dead and :deact flags in the memcg_slabinfo attribute. Signed-off-by: Roman Gushchin --- include/linux/memcontrol.h | 5 +- include/linux/slab.h | 3 +- mm/memcontrol.c | 123 ++--------- mm/slab.c | 16 +- mm/slab.h | 121 ++++------- mm/slab_common.c | 414 ++++--------------------------------- mm/slub.c | 38 +--- 7 files changed, 106 insertions(+), 614 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index a75980559a47..f36203cf75f8 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -327,7 +327,6 @@ struct mem_cgroup { /* Index in the kmem_cache->memcg_params.memcg_caches array */ int kmemcg_id; enum memcg_kmem_state kmem_state; - struct list_head kmem_caches; struct mem_cgroup_ptr __rcu *kmem_memcg_ptr; struct list_head kmem_memcg_ptr_list; #endif @@ -1436,9 +1435,7 @@ static inline void memcg_set_shrinker_bit(struct mem_cgroup *memcg, } #endif -struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep, - struct mem_cgroup **memcgp); -void memcg_kmem_put_cache(struct kmem_cache *cachep); +struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep); #ifdef CONFIG_MEMCG_KMEM int __memcg_kmem_charge(struct page *page, gfp_t gfp, int order); diff --git a/include/linux/slab.h b/include/linux/slab.h index 877a95c6a2d2..246474e9c706 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -155,8 +155,7 @@ struct kmem_cache *kmem_cache_create_usercopy(const char *name, void kmem_cache_destroy(struct kmem_cache *); int kmem_cache_shrink(struct kmem_cache *); -void memcg_create_kmem_cache(struct mem_cgroup *, struct kmem_cache *); -void memcg_deactivate_kmem_caches(struct mem_cgroup *, struct mem_cgroup *); +void memcg_create_kmem_cache(struct kmem_cache *); /* * Please use this macro to create slab caches. Simply specify the diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0c9698f03cfe..b0d0c833150c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -330,7 +330,7 @@ static void memcg_reparent_kmem_memcg_ptr(struct mem_cgroup *memcg, } /* - * This will be the memcg's index in each cache's ->memcg_params.memcg_caches. + * This will be used as a shrinker list's index. * The main reason for not using cgroup id for this: * this works better in sparse environments, where we have a lot of memcgs, * but only a few kmem-limited. Or also, if we have, for instance, 200 @@ -2938,9 +2938,7 @@ static int memcg_alloc_cache_id(void) else if (size > MEMCG_CACHES_MAX_SIZE) size = MEMCG_CACHES_MAX_SIZE; - err = memcg_update_all_caches(size); - if (!err) - err = memcg_update_all_list_lrus(size); + err = memcg_update_all_list_lrus(size); if (!err) memcg_nr_cache_ids = size; @@ -2959,7 +2957,6 @@ static void memcg_free_cache_id(int id) } struct memcg_kmem_cache_create_work { - struct mem_cgroup *memcg; struct kmem_cache *cachep; struct work_struct work; }; @@ -2968,31 +2965,24 @@ static void memcg_kmem_cache_create_func(struct work_struct *w) { struct memcg_kmem_cache_create_work *cw = container_of(w, struct memcg_kmem_cache_create_work, work); - struct mem_cgroup *memcg = cw->memcg; struct kmem_cache *cachep = cw->cachep; - memcg_create_kmem_cache(memcg, cachep); + memcg_create_kmem_cache(cachep); - css_put(&memcg->css); kfree(cw); } /* * Enqueue the creation of a per-memcg kmem_cache. */ -static void memcg_schedule_kmem_cache_create(struct mem_cgroup *memcg, - struct kmem_cache *cachep) +static void memcg_schedule_kmem_cache_create(struct kmem_cache *cachep) { struct memcg_kmem_cache_create_work *cw; - if (!css_tryget_online(&memcg->css)) - return; - cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN); if (!cw) return; - cw->memcg = memcg; cw->cachep = cachep; INIT_WORK(&cw->work, memcg_kmem_cache_create_func); @@ -3000,96 +2990,26 @@ static void memcg_schedule_kmem_cache_create(struct mem_cgroup *memcg, } /** - * memcg_kmem_get_cache: select the correct per-memcg cache for allocation + * memcg_kmem_get_cache: select memcg or root cache for allocation * @cachep: the original global kmem cache * * Return the kmem_cache we're supposed to use for a slab allocation. - * We try to use the current memcg's version of the cache. * * If the cache does not exist yet, if we are the first user of it, we * create it asynchronously in a workqueue and let the current allocation * go through with the original cache. - * - * This function takes a reference to the cache it returns to assure it - * won't get destroyed while we are working with it. Once the caller is - * done with it, memcg_kmem_put_cache() must be called to release the - * reference. */ -struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep, - struct mem_cgroup **memcgp) +struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep) { - struct mem_cgroup *memcg; struct kmem_cache *memcg_cachep; - struct memcg_cache_array *arr; - int kmemcg_id; - VM_BUG_ON(!is_root_cache(cachep)); - - if (memcg_kmem_bypass()) + memcg_cachep = READ_ONCE(cachep->memcg_params.memcg_cache); + if (unlikely(!memcg_cachep)) { + memcg_schedule_kmem_cache_create(cachep); return cachep; - - rcu_read_lock(); - - if (unlikely(current->active_memcg)) - memcg = current->active_memcg; - else - memcg = mem_cgroup_from_task(current); - - if (!memcg || memcg == root_mem_cgroup) - goto out_unlock; - - kmemcg_id = READ_ONCE(memcg->kmemcg_id); - if (kmemcg_id < 0) - goto out_unlock; - - arr = rcu_dereference(cachep->memcg_params.memcg_caches); - - /* - * Make sure we will access the up-to-date value. The code updating - * memcg_caches issues a write barrier to match the data dependency - * barrier inside READ_ONCE() (see memcg_create_kmem_cache()). - */ - memcg_cachep = READ_ONCE(arr->entries[kmemcg_id]); - - /* - * If we are in a safe context (can wait, and not in interrupt - * context), we could be be predictable and return right away. - * This would guarantee that the allocation being performed - * already belongs in the new cache. - * - * However, there are some clashes that can arrive from locking. - * For instance, because we acquire the slab_mutex while doing - * memcg_create_kmem_cache, this means no further allocation - * could happen with the slab_mutex held. So it's better to - * defer everything. - * - * If the memcg is dying or memcg_cache is about to be released, - * don't bother creating new kmem_caches. Because memcg_cachep - * is ZEROed as the fist step of kmem offlining, we don't need - * percpu_ref_tryget_live() here. css_tryget_online() check in - * memcg_schedule_kmem_cache_create() will prevent us from - * creation of a new kmem_cache. - */ - if (unlikely(!memcg_cachep)) - memcg_schedule_kmem_cache_create(memcg, cachep); - else if (percpu_ref_tryget(&memcg_cachep->memcg_params.refcnt)) { - css_get(&memcg->css); - *memcgp = memcg; - cachep = memcg_cachep; } -out_unlock: - rcu_read_unlock(); - return cachep; -} -/** - * memcg_kmem_put_cache: drop reference taken by memcg_kmem_get_cache - * @cachep: the cache returned by memcg_kmem_get_cache - */ -void memcg_kmem_put_cache(struct kmem_cache *cachep) -{ - if (!is_root_cache(cachep)) - percpu_ref_put(&cachep->memcg_params.refcnt); + return memcg_cachep; } /** @@ -3669,7 +3589,6 @@ static int memcg_online_kmem(struct mem_cgroup *memcg) */ memcg->kmemcg_id = memcg_id; memcg->kmem_state = KMEM_ONLINE; - INIT_LIST_HEAD(&memcg->kmem_caches); return 0; } @@ -3682,12 +3601,7 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg) if (memcg->kmem_state != KMEM_ONLINE) return; - /* - * Clear the online state before clearing memcg_caches array - * entries. The slab_mutex in memcg_deactivate_kmem_caches() - * guarantees that no cache will be created for this cgroup - * after we are done (see memcg_create_kmem_cache()). - */ + memcg->kmem_state = KMEM_ALLOCATED; parent = parent_mem_cgroup(memcg); @@ -3695,12 +3609,10 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg) parent = root_mem_cgroup; /* - * Deactivate and reparent kmem_caches and reparent kmem_memcg_ptr. - * Then flush percpu slab statistics to have precise values at the - * parent and all ancestor levels. It's required to keep slab stats - * accurate after the reparenting of kmem_caches. + * Reparent kmem_memcg_ptr. Then flush percpu slab statistics to have + * precise values at the parent and all ancestor levels. It's required + * to keep slab stats accurate after the reparenting of kmem_caches. */ - memcg_deactivate_kmem_caches(memcg, parent); memcg_reparent_kmem_memcg_ptr(memcg, parent); memcg_flush_percpu_vmstats(memcg, true); @@ -3736,10 +3648,8 @@ static void memcg_free_kmem(struct mem_cgroup *memcg) if (unlikely(memcg->kmem_state == KMEM_ONLINE)) memcg_offline_kmem(memcg); - if (memcg->kmem_state == KMEM_ALLOCATED) { - WARN_ON(!list_empty(&memcg->kmem_caches)); + if (memcg->kmem_state == KMEM_ALLOCATED) static_branch_dec(&memcg_kmem_enabled_key); - } } #else static int memcg_online_kmem(struct mem_cgroup *memcg) @@ -5316,9 +5226,6 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) /* The following stuff does not apply to the root */ if (!parent) { -#ifdef CONFIG_MEMCG_KMEM - INIT_LIST_HEAD(&memcg->kmem_caches); -#endif root_mem_cgroup = memcg; return &memcg->css; } diff --git a/mm/slab.c b/mm/slab.c index 91cd8bc4ee07..0914d7cd869f 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1239,7 +1239,7 @@ void __init kmem_cache_init(void) nr_node_ids * sizeof(struct kmem_cache_node *), SLAB_HWCACHE_ALIGN, 0, 0); list_add(&kmem_cache->list, &slab_caches); - memcg_link_cache(kmem_cache, NULL); + memcg_link_cache(kmem_cache); slab_state = PARTIAL; /* @@ -2244,17 +2244,6 @@ int __kmem_cache_shrink(struct kmem_cache *cachep) return (ret ? 1 : 0); } -#ifdef CONFIG_MEMCG -void __kmemcg_cache_deactivate(struct kmem_cache *cachep) -{ - __kmem_cache_shrink(cachep); -} - -void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s) -{ -} -#endif - int __kmem_cache_shutdown(struct kmem_cache *cachep) { return __kmem_cache_shrink(cachep); @@ -3862,7 +3851,8 @@ static int do_tune_cpucache(struct kmem_cache *cachep, int limit, return ret; lockdep_assert_held(&slab_mutex); - for_each_memcg_cache(c, cachep) { + c = memcg_cache(cachep); + if (c) { /* return value determined by the root cache only */ __do_tune_cpucache(c, limit, batchcount, shared, gfp); } diff --git a/mm/slab.h b/mm/slab.h index a6330065d434..035c2969a2ca 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -32,66 +32,25 @@ struct kmem_cache { #else /* !CONFIG_SLOB */ -struct memcg_cache_array { - struct rcu_head rcu; - struct kmem_cache *entries[0]; -}; - /* * This is the main placeholder for memcg-related information in kmem caches. - * Both the root cache and the child caches will have it. For the root cache, - * this will hold a dynamically allocated array large enough to hold - * information about the currently limited memcgs in the system. To allow the - * array to be accessed without taking any locks, on relocation we free the old - * version only after a grace period. - * - * Root and child caches hold different metadata. + * Both the root cache and the child cache will have it. Some fields are used + * in both cases, other are specific to root caches. * * @root_cache: Common to root and child caches. NULL for root, pointer to * the root cache for children. * * The following fields are specific to root caches. * - * @memcg_caches: kmemcg ID indexed table of child caches. This table is - * used to index child cachces during allocation and cleared - * early during shutdown. - * - * @root_caches_node: List node for slab_root_caches list. - * - * @children: List of all child caches. While the child caches are also - * reachable through @memcg_caches, a child cache remains on - * this list until it is actually destroyed. - * - * The following fields are specific to child caches. - * - * @memcg: Pointer to the memcg this cache belongs to. - * - * @children_node: List node for @root_cache->children list. - * - * @kmem_caches_node: List node for @memcg->kmem_caches list. + * @memcg_cache: pointer to memcg kmem cache, used by all non-root memory + * cgroups. + * @root_caches_node: list node for slab_root_caches list. */ struct memcg_cache_params { struct kmem_cache *root_cache; - union { - struct { - struct memcg_cache_array __rcu *memcg_caches; - struct list_head __root_caches_node; - struct list_head children; - bool dying; - }; - struct { - struct mem_cgroup *memcg; - struct list_head children_node; - struct list_head kmem_caches_node; - struct percpu_ref refcnt; - - void (*work_fn)(struct kmem_cache *); - union { - struct rcu_head rcu_head; - struct work_struct work; - }; - }; - }; + + struct kmem_cache *memcg_cache; + struct list_head __root_caches_node; }; #endif /* CONFIG_SLOB */ @@ -234,8 +193,6 @@ bool __kmem_cache_empty(struct kmem_cache *); int __kmem_cache_shutdown(struct kmem_cache *); void __kmem_cache_release(struct kmem_cache *); int __kmem_cache_shrink(struct kmem_cache *); -void __kmemcg_cache_deactivate(struct kmem_cache *s); -void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s); void slab_kmem_cache_release(struct kmem_cache *); void kmem_cache_shrink_all(struct kmem_cache *s); @@ -281,14 +238,6 @@ static inline int cache_vmstat_idx(struct kmem_cache *s) extern struct list_head slab_root_caches; #define root_caches_node memcg_params.__root_caches_node -/* - * Iterate over all memcg caches of the given root cache. The caller must hold - * slab_mutex. - */ -#define for_each_memcg_cache(iter, root) \ - list_for_each_entry(iter, &(root)->memcg_params.children, \ - memcg_params.children_node) - static inline bool is_root_cache(struct kmem_cache *s) { return !s->memcg_params.root_cache; @@ -319,6 +268,13 @@ static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s) return s->memcg_params.root_cache; } +static inline struct kmem_cache *memcg_cache(struct kmem_cache *s) +{ + if (is_root_cache(s)) + return s->memcg_params.memcg_cache; + return NULL; +} + /* * Expects a pointer to a slab page. Please note, that PageSlab() check * isn't sufficient, as it returns true also for tail compound slab pages, @@ -367,17 +323,27 @@ static inline struct kmem_cache *memcg_slab_pre_alloc_hook(struct kmem_cache *s, size_t size, gfp_t flags) { struct kmem_cache *cachep; + struct mem_cgroup *memcg; + + if (memcg_kmem_bypass()) + return s; - cachep = memcg_kmem_get_cache(s, memcgp); - if (is_root_cache(cachep)) + memcg = get_mem_cgroup_from_current(); + if (!memcg || mem_cgroup_is_root(memcg)) return s; - if (__memcg_kmem_charge_subpage(*memcgp, size * s->size, flags)) { - mem_cgroup_put(*memcgp); - memcg_kmem_put_cache(cachep); - cachep = NULL; + cachep = memcg_kmem_get_cache(s); + if (is_root_cache(cachep)) { + mem_cgroup_put(memcg); + return s; } + if (__memcg_kmem_charge_subpage(memcg, size * s->size, flags)) { + mem_cgroup_put(memcg); + return NULL; + } + + *memcgp = memcg; return cachep; } @@ -407,8 +373,6 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, } mem_cgroup_ptr_put(memcg_ptr); mem_cgroup_put(memcg); - - memcg_kmem_put_cache(s); } static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page, @@ -437,7 +401,7 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page, } extern void slab_init_memcg_params(struct kmem_cache *); -extern void memcg_link_cache(struct kmem_cache *s, struct mem_cgroup *memcg); +extern void memcg_link_cache(struct kmem_cache *s); #else /* CONFIG_MEMCG_KMEM */ @@ -445,9 +409,6 @@ extern void memcg_link_cache(struct kmem_cache *s, struct mem_cgroup *memcg); #define slab_root_caches slab_caches #define root_caches_node list -#define for_each_memcg_cache(iter, root) \ - for ((void)(iter), (void)(root); 0; ) - static inline bool is_root_cache(struct kmem_cache *s) { return true; @@ -469,6 +430,11 @@ static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s) return s; } +static inline struct kmem_cache *memcg_cache(struct kmem_cache *s) +{ + return NULL; +} + static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr) { return NULL; @@ -506,8 +472,7 @@ static inline void slab_init_memcg_params(struct kmem_cache *s) { } -static inline void memcg_link_cache(struct kmem_cache *s, - struct mem_cgroup *memcg) +static inline void memcg_link_cache(struct kmem_cache *s) { } @@ -535,8 +500,6 @@ static __always_inline int charge_slab_page(struct page *page, ret = memcg_alloc_page_memcg_vec(page, gfp, objects); if (ret) return ret; - - percpu_ref_get_many(&s->memcg_params.refcnt, 1 << order); } mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), PAGE_SIZE << order); @@ -546,10 +509,9 @@ static __always_inline int charge_slab_page(struct page *page, static __always_inline void uncharge_slab_page(struct page *page, int order, struct kmem_cache *s) { - if (!is_root_cache(s)) { + if (!is_root_cache(s)) memcg_free_page_memcg_vec(page); - percpu_ref_put_many(&s->memcg_params.refcnt, 1 << order); - } + mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), -(PAGE_SIZE << order)); } @@ -698,9 +660,6 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node) void *slab_start(struct seq_file *m, loff_t *pos); void *slab_next(struct seq_file *m, void *p, loff_t *pos); void slab_stop(struct seq_file *m, void *p); -void *memcg_slab_start(struct seq_file *m, loff_t *pos); -void *memcg_slab_next(struct seq_file *m, void *p, loff_t *pos); -void memcg_slab_stop(struct seq_file *m, void *p); int memcg_slab_show(struct seq_file *m, void *p); #if defined(CONFIG_SLAB) || defined(CONFIG_SLUB_DEBUG) diff --git a/mm/slab_common.c b/mm/slab_common.c index cc0c70b57c1c..3dcd90ba7525 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -131,141 +131,36 @@ int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t nr, #ifdef CONFIG_MEMCG_KMEM LIST_HEAD(slab_root_caches); -static DEFINE_SPINLOCK(memcg_kmem_wq_lock); - -static void kmemcg_cache_shutdown(struct percpu_ref *percpu_ref); void slab_init_memcg_params(struct kmem_cache *s) { s->memcg_params.root_cache = NULL; - RCU_INIT_POINTER(s->memcg_params.memcg_caches, NULL); - INIT_LIST_HEAD(&s->memcg_params.children); - s->memcg_params.dying = false; + s->memcg_params.memcg_cache = NULL; } -static int init_memcg_params(struct kmem_cache *s, - struct kmem_cache *root_cache) +static void init_memcg_params(struct kmem_cache *s, + struct kmem_cache *root_cache) { - struct memcg_cache_array *arr; - - if (root_cache) { - int ret = percpu_ref_init(&s->memcg_params.refcnt, - kmemcg_cache_shutdown, - 0, GFP_KERNEL); - if (ret) - return ret; - + if (root_cache) s->memcg_params.root_cache = root_cache; - INIT_LIST_HEAD(&s->memcg_params.children_node); - INIT_LIST_HEAD(&s->memcg_params.kmem_caches_node); - return 0; - } - - slab_init_memcg_params(s); - - if (!memcg_nr_cache_ids) - return 0; - - arr = kvzalloc(sizeof(struct memcg_cache_array) + - memcg_nr_cache_ids * sizeof(void *), - GFP_KERNEL); - if (!arr) - return -ENOMEM; - - RCU_INIT_POINTER(s->memcg_params.memcg_caches, arr); - return 0; -} - -static void destroy_memcg_params(struct kmem_cache *s) -{ - if (is_root_cache(s)) { - kvfree(rcu_access_pointer(s->memcg_params.memcg_caches)); - } else { - mem_cgroup_put(s->memcg_params.memcg); - WRITE_ONCE(s->memcg_params.memcg, NULL); - percpu_ref_exit(&s->memcg_params.refcnt); - } + else + slab_init_memcg_params(s); } -static void free_memcg_params(struct rcu_head *rcu) +void memcg_link_cache(struct kmem_cache *s) { - struct memcg_cache_array *old; - - old = container_of(rcu, struct memcg_cache_array, rcu); - kvfree(old); -} - -static int update_memcg_params(struct kmem_cache *s, int new_array_size) -{ - struct memcg_cache_array *old, *new; - - new = kvzalloc(sizeof(struct memcg_cache_array) + - new_array_size * sizeof(void *), GFP_KERNEL); - if (!new) - return -ENOMEM; - - old = rcu_dereference_protected(s->memcg_params.memcg_caches, - lockdep_is_held(&slab_mutex)); - if (old) - memcpy(new->entries, old->entries, - memcg_nr_cache_ids * sizeof(void *)); - - rcu_assign_pointer(s->memcg_params.memcg_caches, new); - if (old) - call_rcu(&old->rcu, free_memcg_params); - return 0; -} - -int memcg_update_all_caches(int num_memcgs) -{ - struct kmem_cache *s; - int ret = 0; - - mutex_lock(&slab_mutex); - list_for_each_entry(s, &slab_root_caches, root_caches_node) { - ret = update_memcg_params(s, num_memcgs); - /* - * Instead of freeing the memory, we'll just leave the caches - * up to this point in an updated state. - */ - if (ret) - break; - } - mutex_unlock(&slab_mutex); - return ret; -} - -void memcg_link_cache(struct kmem_cache *s, struct mem_cgroup *memcg) -{ - if (is_root_cache(s)) { + if (is_root_cache(s)) list_add(&s->root_caches_node, &slab_root_caches); - } else { - css_get(&memcg->css); - s->memcg_params.memcg = memcg; - list_add(&s->memcg_params.children_node, - &s->memcg_params.root_cache->memcg_params.children); - list_add(&s->memcg_params.kmem_caches_node, - &s->memcg_params.memcg->kmem_caches); - } } static void memcg_unlink_cache(struct kmem_cache *s) { - if (is_root_cache(s)) { + if (is_root_cache(s)) list_del(&s->root_caches_node); - } else { - list_del(&s->memcg_params.children_node); - list_del(&s->memcg_params.kmem_caches_node); - } } #else -static inline int init_memcg_params(struct kmem_cache *s, - struct kmem_cache *root_cache) -{ - return 0; -} - -static inline void destroy_memcg_params(struct kmem_cache *s) +static inline void init_memcg_params(struct kmem_cache *s, + struct kmem_cache *root_cache) { } @@ -380,7 +275,7 @@ static struct kmem_cache *create_cache(const char *name, unsigned int object_size, unsigned int align, slab_flags_t flags, unsigned int useroffset, unsigned int usersize, void (*ctor)(void *), - struct mem_cgroup *memcg, struct kmem_cache *root_cache) + struct kmem_cache *root_cache) { struct kmem_cache *s; int err; @@ -400,24 +295,20 @@ static struct kmem_cache *create_cache(const char *name, s->useroffset = useroffset; s->usersize = usersize; - err = init_memcg_params(s, root_cache); - if (err) - goto out_free_cache; - + init_memcg_params(s, root_cache); err = __kmem_cache_create(s, flags); if (err) goto out_free_cache; s->refcount = 1; list_add(&s->list, &slab_caches); - memcg_link_cache(s, memcg); + memcg_link_cache(s); out: if (err) return ERR_PTR(err); return s; out_free_cache: - destroy_memcg_params(s); kmem_cache_free(kmem_cache, s); goto out; } @@ -504,7 +395,7 @@ kmem_cache_create_usercopy(const char *name, s = create_cache(cache_name, size, calculate_alignment(flags, align, size), - flags, useroffset, usersize, ctor, NULL, NULL); + flags, useroffset, usersize, ctor, NULL); if (IS_ERR(s)) { err = PTR_ERR(s); kfree_const(cache_name); @@ -629,51 +520,27 @@ static int shutdown_cache(struct kmem_cache *s) #ifdef CONFIG_MEMCG_KMEM /* - * memcg_create_kmem_cache - Create a cache for a memory cgroup. - * @memcg: The memory cgroup the new cache is for. + * memcg_create_kmem_cache - Create a cache for non-root memory cgroups. * @root_cache: The parent of the new cache. * * This function attempts to create a kmem cache that will serve allocation - * requests going from @memcg to @root_cache. The new cache inherits properties - * from its parent. + * requests going all non-root memory cgroups to @root_cache. The new cache + * inherits properties from its parent. */ -void memcg_create_kmem_cache(struct mem_cgroup *memcg, - struct kmem_cache *root_cache) +void memcg_create_kmem_cache(struct kmem_cache *root_cache) { - static char memcg_name_buf[NAME_MAX + 1]; /* protected by slab_mutex */ - struct cgroup_subsys_state *css = &memcg->css; - struct memcg_cache_array *arr; struct kmem_cache *s = NULL; char *cache_name; - int idx; get_online_cpus(); get_online_mems(); mutex_lock(&slab_mutex); - /* - * The memory cgroup could have been offlined while the cache - * creation work was pending. - */ - if (memcg->kmem_state != KMEM_ONLINE) - goto out_unlock; - - idx = memcg_cache_id(memcg); - arr = rcu_dereference_protected(root_cache->memcg_params.memcg_caches, - lockdep_is_held(&slab_mutex)); - - /* - * Since per-memcg caches are created asynchronously on first - * allocation (see memcg_kmem_get_cache()), several threads can try to - * create the same cache, but only one of them may succeed. - */ - if (arr->entries[idx]) + if (root_cache->memcg_params.memcg_cache) goto out_unlock; - cgroup_name(css->cgroup, memcg_name_buf, sizeof(memcg_name_buf)); - cache_name = kasprintf(GFP_KERNEL, "%s(%llu:%s)", root_cache->name, - css->serial_nr, memcg_name_buf); + cache_name = kasprintf(GFP_KERNEL, "%s-memcg", root_cache->name); if (!cache_name) goto out_unlock; @@ -681,7 +548,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg, root_cache->align, root_cache->flags & CACHE_CREATE_MASK, root_cache->useroffset, root_cache->usersize, - root_cache->ctor, memcg, root_cache); + root_cache->ctor, root_cache); /* * If we could not create a memcg cache, do not complain, because * that's not critical at all as we can always proceed with the root @@ -698,7 +565,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg, * initialized. */ smp_wmb(); - arr->entries[idx] = s; + root_cache->memcg_params.memcg_cache = s; out_unlock: mutex_unlock(&slab_mutex); @@ -707,197 +574,18 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg, put_online_cpus(); } -static void kmemcg_workfn(struct work_struct *work) -{ - struct kmem_cache *s = container_of(work, struct kmem_cache, - memcg_params.work); - - get_online_cpus(); - get_online_mems(); - - mutex_lock(&slab_mutex); - s->memcg_params.work_fn(s); - mutex_unlock(&slab_mutex); - - put_online_mems(); - put_online_cpus(); -} - -static void kmemcg_rcufn(struct rcu_head *head) -{ - struct kmem_cache *s = container_of(head, struct kmem_cache, - memcg_params.rcu_head); - - /* - * We need to grab blocking locks. Bounce to ->work. The - * work item shares the space with the RCU head and can't be - * initialized eariler. - */ - INIT_WORK(&s->memcg_params.work, kmemcg_workfn); - queue_work(memcg_kmem_cache_wq, &s->memcg_params.work); -} - -static void kmemcg_cache_shutdown_fn(struct kmem_cache *s) -{ - WARN_ON(shutdown_cache(s)); -} - -static void kmemcg_cache_shutdown(struct percpu_ref *percpu_ref) -{ - struct kmem_cache *s = container_of(percpu_ref, struct kmem_cache, - memcg_params.refcnt); - unsigned long flags; - - spin_lock_irqsave(&memcg_kmem_wq_lock, flags); - if (s->memcg_params.root_cache->memcg_params.dying) - goto unlock; - - s->memcg_params.work_fn = kmemcg_cache_shutdown_fn; - INIT_WORK(&s->memcg_params.work, kmemcg_workfn); - queue_work(memcg_kmem_cache_wq, &s->memcg_params.work); - -unlock: - spin_unlock_irqrestore(&memcg_kmem_wq_lock, flags); -} - -static void kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s) -{ - __kmemcg_cache_deactivate_after_rcu(s); - percpu_ref_kill(&s->memcg_params.refcnt); -} - -static void kmemcg_cache_deactivate(struct kmem_cache *s) -{ - if (WARN_ON_ONCE(is_root_cache(s))) - return; - - __kmemcg_cache_deactivate(s); - s->flags |= SLAB_DEACTIVATED; - - /* - * memcg_kmem_wq_lock is used to synchronize memcg_params.dying - * flag and make sure that no new kmem_cache deactivation tasks - * are queued (see flush_memcg_workqueue() ). - */ - spin_lock_irq(&memcg_kmem_wq_lock); - if (s->memcg_params.root_cache->memcg_params.dying) - goto unlock; - - s->memcg_params.work_fn = kmemcg_cache_deactivate_after_rcu; - call_rcu(&s->memcg_params.rcu_head, kmemcg_rcufn); -unlock: - spin_unlock_irq(&memcg_kmem_wq_lock); -} - -void memcg_deactivate_kmem_caches(struct mem_cgroup *memcg, - struct mem_cgroup *parent) -{ - int idx; - struct memcg_cache_array *arr; - struct kmem_cache *s, *c; - unsigned int nr_reparented; - - idx = memcg_cache_id(memcg); - - get_online_cpus(); - get_online_mems(); - - mutex_lock(&slab_mutex); - list_for_each_entry(s, &slab_root_caches, root_caches_node) { - arr = rcu_dereference_protected(s->memcg_params.memcg_caches, - lockdep_is_held(&slab_mutex)); - c = arr->entries[idx]; - if (!c) - continue; - - kmemcg_cache_deactivate(c); - arr->entries[idx] = NULL; - } - nr_reparented = 0; - list_for_each_entry(s, &memcg->kmem_caches, - memcg_params.kmem_caches_node) { - WRITE_ONCE(s->memcg_params.memcg, parent); - css_put(&memcg->css); - nr_reparented++; - } - if (nr_reparented) { - list_splice_init(&memcg->kmem_caches, - &parent->kmem_caches); - css_get_many(&parent->css, nr_reparented); - } - mutex_unlock(&slab_mutex); - - put_online_mems(); - put_online_cpus(); -} - static int shutdown_memcg_caches(struct kmem_cache *s) { - struct memcg_cache_array *arr; - struct kmem_cache *c, *c2; - LIST_HEAD(busy); - int i; - BUG_ON(!is_root_cache(s)); - /* - * First, shutdown active caches, i.e. caches that belong to online - * memory cgroups. - */ - arr = rcu_dereference_protected(s->memcg_params.memcg_caches, - lockdep_is_held(&slab_mutex)); - for_each_memcg_cache_index(i) { - c = arr->entries[i]; - if (!c) - continue; - if (shutdown_cache(c)) - /* - * The cache still has objects. Move it to a temporary - * list so as not to try to destroy it for a second - * time while iterating over inactive caches below. - */ - list_move(&c->memcg_params.children_node, &busy); - else - /* - * The cache is empty and will be destroyed soon. Clear - * the pointer to it in the memcg_caches array so that - * it will never be accessed even if the root cache - * stays alive. - */ - arr->entries[i] = NULL; - } - - /* - * Second, shutdown all caches left from memory cgroups that are now - * offline. - */ - list_for_each_entry_safe(c, c2, &s->memcg_params.children, - memcg_params.children_node) - shutdown_cache(c); - - list_splice(&busy, &s->memcg_params.children); + if (s->memcg_params.memcg_cache) + WARN_ON(shutdown_cache(s->memcg_params.memcg_cache)); - /* - * A cache being destroyed must be empty. In particular, this means - * that all per memcg caches attached to it must be empty too. - */ - if (!list_empty(&s->memcg_params.children)) - return -EBUSY; return 0; } static void flush_memcg_workqueue(struct kmem_cache *s) { - spin_lock_irq(&memcg_kmem_wq_lock); - s->memcg_params.dying = true; - spin_unlock_irq(&memcg_kmem_wq_lock); - - /* - * SLAB and SLUB deactivate the kmem_caches through call_rcu. Make - * sure all registered rcu callbacks have been invoked. - */ - rcu_barrier(); - /* * SLAB and SLUB create memcg kmem_caches through workqueue and SLUB * deactivates the memcg kmem_caches through workqueue. Make sure all @@ -919,7 +607,6 @@ static inline void flush_memcg_workqueue(struct kmem_cache *s) void slab_kmem_cache_release(struct kmem_cache *s) { __kmem_cache_release(s); - destroy_memcg_params(s); kfree_const(s->name); kmem_cache_free(kmem_cache, s); } @@ -983,7 +670,7 @@ int kmem_cache_shrink(struct kmem_cache *cachep) EXPORT_SYMBOL(kmem_cache_shrink); /** - * kmem_cache_shrink_all - shrink a cache and all memcg caches for root cache + * kmem_cache_shrink_all - shrink root and memcg caches * @s: The cache pointer */ void kmem_cache_shrink_all(struct kmem_cache *s) @@ -1000,21 +687,11 @@ void kmem_cache_shrink_all(struct kmem_cache *s) kasan_cache_shrink(s); __kmem_cache_shrink(s); - /* - * We have to take the slab_mutex to protect from the memcg list - * modification. - */ - mutex_lock(&slab_mutex); - for_each_memcg_cache(c, s) { - /* - * Don't need to shrink deactivated memcg caches. - */ - if (s->flags & SLAB_DEACTIVATED) - continue; + c = memcg_cache(s); + if (c) { kasan_cache_shrink(c); __kmem_cache_shrink(c); } - mutex_unlock(&slab_mutex); put_online_mems(); put_online_cpus(); } @@ -1069,7 +746,7 @@ struct kmem_cache *__init create_kmalloc_cache(const char *name, create_boot_cache(s, name, size, flags, useroffset, usersize); list_add(&s->list, &slab_caches); - memcg_link_cache(s, NULL); + memcg_link_cache(s); s->refcount = 1; return s; } @@ -1431,7 +1108,8 @@ memcg_accumulate_slabinfo(struct kmem_cache *s, struct slabinfo *info) if (!is_root_cache(s)) return; - for_each_memcg_cache(c, s) { + c = memcg_cache(s); + if (c) { memset(&sinfo, 0, sizeof(sinfo)); get_slabinfo(c, &sinfo); @@ -1562,7 +1240,7 @@ module_init(slab_proc_init); #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_MEMCG_KMEM) /* - * Display information about kmem caches that have child memcg caches. + * Display information about kmem caches that have memcg cache. */ static int memcg_slabinfo_show(struct seq_file *m, void *unused) { @@ -1574,9 +1252,9 @@ static int memcg_slabinfo_show(struct seq_file *m, void *unused) seq_puts(m, " \n"); list_for_each_entry(s, &slab_root_caches, root_caches_node) { /* - * Skip kmem caches that don't have any memcg children. + * Skip kmem caches that don't have the memcg cache. */ - if (list_empty(&s->memcg_params.children)) + if (!s->memcg_params.memcg_cache) continue; memset(&sinfo, 0, sizeof(sinfo)); @@ -1585,23 +1263,13 @@ static int memcg_slabinfo_show(struct seq_file *m, void *unused) cache_name(s), sinfo.active_objs, sinfo.num_objs, sinfo.active_slabs, sinfo.num_slabs); - for_each_memcg_cache(c, s) { - struct cgroup_subsys_state *css; - char *status = ""; - - css = &c->memcg_params.memcg->css; - if (!(css->flags & CSS_ONLINE)) - status = ":dead"; - else if (c->flags & SLAB_DEACTIVATED) - status = ":deact"; - - memset(&sinfo, 0, sizeof(sinfo)); - get_slabinfo(c, &sinfo); - seq_printf(m, "%-17s %4d%-6s %6lu %6lu %6lu %6lu\n", - cache_name(c), css->id, status, - sinfo.active_objs, sinfo.num_objs, - sinfo.active_slabs, sinfo.num_slabs); - } + c = s->memcg_params.memcg_cache; + memset(&sinfo, 0, sizeof(sinfo)); + get_slabinfo(c, &sinfo); + seq_printf(m, "%-17s %4d %6lu %6lu %6lu %6lu\n", + cache_name(c), root_mem_cgroup->css.id, + sinfo.active_objs, sinfo.num_objs, + sinfo.active_slabs, sinfo.num_slabs); } mutex_unlock(&slab_mutex); return 0; diff --git a/mm/slub.c b/mm/slub.c index a62545c7acac..53abbf0831b6 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4057,36 +4057,6 @@ int __kmem_cache_shrink(struct kmem_cache *s) return ret; } -#ifdef CONFIG_MEMCG -void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s) -{ - /* - * Called with all the locks held after a sched RCU grace period. - * Even if @s becomes empty after shrinking, we can't know that @s - * doesn't have allocations already in-flight and thus can't - * destroy @s until the associated memcg is released. - * - * However, let's remove the sysfs files for empty caches here. - * Each cache has a lot of interface files which aren't - * particularly useful for empty draining caches; otherwise, we can - * easily end up with millions of unnecessary sysfs files on - * systems which have a lot of memory and transient cgroups. - */ - if (!__kmem_cache_shrink(s)) - sysfs_slab_remove(s); -} - -void __kmemcg_cache_deactivate(struct kmem_cache *s) -{ - /* - * Disable empty slabs caching. Used to avoid pinning offline - * memory cgroups by kmem pages that can be freed. - */ - slub_set_cpu_partial(s, 0); - s->min_partial = 0; -} -#endif /* CONFIG_MEMCG */ - static int slab_mem_going_offline_callback(void *arg) { struct kmem_cache *s; @@ -4243,7 +4213,7 @@ static struct kmem_cache * __init bootstrap(struct kmem_cache *static_cache) } slab_init_memcg_params(s); list_add(&s->list, &slab_caches); - memcg_link_cache(s, NULL); + memcg_link_cache(s); return s; } @@ -4311,7 +4281,8 @@ __kmem_cache_alias(const char *name, unsigned int size, unsigned int align, s->object_size = max(s->object_size, size); s->inuse = max(s->inuse, ALIGN(size, sizeof(void *))); - for_each_memcg_cache(c, s) { + c = memcg_cache(s); + if (c) { c->object_size = s->object_size; c->inuse = max(c->inuse, ALIGN(size, sizeof(void *))); } @@ -5582,7 +5553,8 @@ static ssize_t slab_attr_store(struct kobject *kobj, * directly either failed or succeeded, in which case we loop * through the descendants with best-effort propagation. */ - for_each_memcg_cache(c, s) + c = memcg_cache(s); + if (c) attribute->store(c, buf, len); mutex_unlock(&slab_mutex); } From patchwork Fri Oct 18 00:28:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197401 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0D8F915AB for ; Fri, 18 Oct 2019 00:29:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C1BB12245D for ; Fri, 18 Oct 2019 00:29:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="A0BdNat4" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C1BB12245D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A4BC48E000D; Thu, 17 Oct 2019 20:28:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 89C8E8E0010; Thu, 17 Oct 2019 20:28:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 70D088E0010; Thu, 17 Oct 2019 20:28:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0216.hostedemail.com [216.40.44.216]) by kanga.kvack.org (Postfix) with ESMTP id 36D548E000D for ; Thu, 17 Oct 2019 20:28:42 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id C5AD24833 for ; Fri, 18 Oct 2019 00:28:41 +0000 (UTC) X-FDA: 76055019642.26.scarf33_4bdf32aaf5c5c X-Spam-Summary: 2,0,0,624b1e6a62e1524e,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com,RULES_HIT:2:41:355:379:541:800:960:973:982:988:989:1260:1261:1277:1313:1314:1345:1359:1431:1437:1516:1518:1535:1730:1747:1777:1792:1801:2194:2199:2393:2559:2562:2610:2898:3138:3139:3140:3141:3142:3165:3355:3865:3866:3867:3868:3870:3871:3872:4049:4120:4321:4423:4605:5007:6117:6119:6261:6653:7903:7904:8957:9010:10004:11026:11658:11914:12043:12291:12296:12297:12438:12555:12683:12895:13161:13229:13255:13851:14394:21080:21324:21433:21451:21611:21627:21740:21796:21939:21972:30029:30034:30036:30054:30064:30070,0,RBL:67.231.153.30:@fb.com:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:3,LUA_SUMMARY:none X-HE-Tag: scarf33_4bdf32aaf5c5c X-Filterd-Recvd-Size: 9095 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 00:28:41 +0000 (UTC) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0NcwK008470 for ; Thu, 17 Oct 2019 17:28:40 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=V0YM/pCY9GLC/OQdYE1wIpM4De9idAVu4Mjnr3cz5NE=; b=A0BdNat4X8rp15Sa2mMZvJJHR8nBRZ8XtxdWss1kzxwAIB9QvjwagT2Rsl5exfykhW5L +qXM5cq2YQvkTNfetyn1ich9cAe3PbkJzDD0axto7+pmMG6O/uaz/cYjhONLR0I7n9Ad MhVrYLUWWRfOu5vxaFXUnuw/Mj5FBhCTGU0= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2vpw9r9fae-13 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2019 17:28:40 -0700 Received: from 2401:db00:30:600c:face:0:39:0 (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 17 Oct 2019 17:28:36 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id 1FE2218CE8491; Thu, 17 Oct 2019 17:28:34 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 15/16] tools/cgroup: make slabinfo.py compatible with new slab controller Date: Thu, 17 Oct 2019 17:28:19 -0700 Message-ID: <20191018002820.307763-16-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxlogscore=723 lowpriorityscore=0 mlxscore=0 suspectscore=1 phishscore=0 clxscore=1015 adultscore=0 spamscore=0 bulkscore=0 priorityscore=1501 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180001 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Make slabinfo.py compatible with the new slab controller. Because there are no more per-memcg kmem_caches, and also there is no list of all slab pages in the system, it has to walk over all pages and filter out slab pages belonging to non-root kmem_caches. Then it counts objects belonging to the given cgroup. It might sound as a very slow operation, however it's not so slow. It takes about 30s seconds to walk over 8Gb of slabs out of 64Gb, and filter out all objects belonging to the cgroup of interest. Also, it provides an accurate number of active objects, which isn't true for the old slab controller. The script is backward compatible and works for both kernel versions. Signed-off-by: Roman Gushchin --- tools/cgroup/slabinfo.py | 105 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 98 insertions(+), 7 deletions(-) diff --git a/tools/cgroup/slabinfo.py b/tools/cgroup/slabinfo.py index 40b01a6ec4b0..79909ee36fe3 100755 --- a/tools/cgroup/slabinfo.py +++ b/tools/cgroup/slabinfo.py @@ -8,7 +8,10 @@ import argparse import sys from drgn.helpers.linux import list_for_each_entry, list_empty -from drgn import container_of +from drgn.helpers.linux import for_each_page +from drgn.helpers.linux.cpumask import for_each_online_cpu +from drgn.helpers.linux.percpu import per_cpu_ptr +from drgn import container_of, FaultError DESC = """ @@ -47,7 +50,7 @@ def is_root_cache(s): def cache_name(s): if is_root_cache(s): - return s.name + return s.name.string_().decode('utf-8') else: return s.memcg_params.root_cache.name.string_().decode('utf-8') @@ -99,12 +102,16 @@ def slub_get_slabinfo(s, cfg): # SLAB-specific functions can be added here... -def cache_show(s, cfg): +def cache_show(s, cfg, objs): if cfg['allocator'] == 'SLUB': sinfo = slub_get_slabinfo(s, cfg) else: err('SLAB isn\'t supported yet') + if cfg['shared_slab_pages']: + sinfo['active_objs'] = objs + sinfo['num_objs'] = objs + print('%-17s %6lu %6lu %6u %4u %4d' ' : tunables %4u %4u %4u' ' : slabdata %6lu %6lu %6lu' % ( @@ -127,9 +134,60 @@ def detect_kernel_config(): else: err('Can\'t determine the slab allocator') + if prog.type('struct memcg_cache_params').members[1][1] == 'memcg_cache': + cfg['shared_slab_pages'] = True + else: + cfg['shared_slab_pages'] = False + return cfg +def for_each_slab_page(prog): + PGSlab = 1 << prog.constant('PG_slab') + PGHead = 1 << prog.constant('PG_head') + + for page in for_each_page(prog): + try: + if page.flags.value_() & PGSlab: + yield page + except FaultError: + pass + + +# it doesn't find all objects, because by default full slabs are not +# placed to the full list. however, it can be used in certain cases. +def for_each_slab_page_fast(prog): + for s in list_for_each_entry('struct kmem_cache', + prog['slab_caches'].address_of_(), + 'list'): + if is_root_cache(s): + continue + + if s.cpu_partial: + for cpu in for_each_online_cpu(prog): + cpu_slab = per_cpu_ptr(s.cpu_slab, cpu) + if cpu_slab.page: + yield cpu_slab.page + + page = cpu_slab.partial + while page: + yield page + page = page.next + + for node in range(prog['nr_online_nodes'].value_()): + n = s.node[node] + + for page in list_for_each_entry('struct page', + n.partial.address_of_(), + 'slab_list'): + yield page + + for page in list_for_each_entry('struct page', + n.full.address_of_(), + 'slab_list'): + yield page + + def main(): parser = argparse.ArgumentParser(description=DESC, formatter_class=argparse.RawTextHelpFormatter) @@ -150,10 +208,43 @@ def main(): ' : tunables ' ' : slabdata ') - for s in list_for_each_entry('struct kmem_cache', - memcg.kmem_caches.address_of_(), - 'memcg_params.kmem_caches_node'): - cache_show(s, cfg) + if cfg['shared_slab_pages']: + memcg_ptrs = set() + stats = {} + caches = {} + + # find memcg pointers belonging to the specified cgroup + for ptr in list_for_each_entry('struct mem_cgroup_ptr', + memcg.kmem_memcg_ptr_list.address_of_(), + 'list'): + memcg_ptrs.add(ptr.value_()) + + # look over all slab pages, belonging to non-root memcgs + # and look for objects belonging to the given memory cgroup + for page in for_each_slab_page(prog): + cache = page.slab_cache + if not cache or is_root_cache(cache): + continue + addr = cache.value_() + caches[addr] = cache + memcg_vec = page.mem_cgroup_vec + + if addr not in stats: + stats[addr] = 0 + + for i in range(oo_objects(cache)): + if memcg_vec[i].value_() in memcg_ptrs: + stats[addr] += 1 + + for addr in caches: + if stats[addr] > 0: + cache_show(caches[addr], cfg, stats[addr]) + + else: + for s in list_for_each_entry('struct kmem_cache', + memcg.kmem_caches.address_of_(), + 'memcg_params.kmem_caches_node'): + cache_show(s, cfg, None) main() From patchwork Fri Oct 18 00:28:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11197449 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5DD3656FE for ; Fri, 18 Oct 2019 01:04:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2057221D7D for ; Fri, 18 Oct 2019 01:04:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="nOPSo1zX" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2057221D7D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 547578E0005; Thu, 17 Oct 2019 21:04:11 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4F70F8E0003; Thu, 17 Oct 2019 21:04:11 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E53A8E0005; Thu, 17 Oct 2019 21:04:11 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0020.hostedemail.com [216.40.44.20]) by kanga.kvack.org (Postfix) with ESMTP id 16B8C8E0003 for ; Thu, 17 Oct 2019 21:04:11 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id ACB3452BC for ; Fri, 18 Oct 2019 01:04:10 +0000 (UTC) X-FDA: 76055109060.04.card20_5e9eb3aae7c0f X-Spam-Summary: 2,0,0,c9f26562ebb85b10,d41d8cd98f00b204,prvs=519417b754=guro@fb.com,::mhocko@kernel.org:hannes@cmpxchg.org:linux-kernel@vger.kernel.org:kernel-team@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:longman@redhat.com:cl@linux.com:guro@fb.com,RULES_HIT:41:355:379:541:800:960:968:973:988:989:1260:1261:1277:1313:1314:1345:1359:1437:1516:1518:1534:1539:1711:1714:1730:1747:1777:1792:2393:2538:2559:2562:3138:3139:3140:3141:3142:3350:3865:3868:3871:4321:5007:6261:6653:9592:10004:10400:11026:11658:11914:12043:12114:12296:12297:12438:12555:12895:13069:13311:13357:14181:14394:14721:21080:21627:30054:30064:30070:30075,0,RBL:67.231.145.42:@fb.com:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: card20_5e9eb3aae7c0f X-Filterd-Recvd-Size: 3638 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Oct 2019 01:04:10 +0000 (UTC) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9I0P68g002605 for ; Thu, 17 Oct 2019 17:28:37 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=nOordKf0fXrBOx3atNeexqICGFlyz2QswU+/WEJB+fI=; b=nOPSo1zXMejNXJGruS+xfNy2y5DMPVIkm6EgjUJLxYYrEaUVkBZdJGBvI0lH9sx+Aedg 2G0IwK4Hp6dG1blN+aaItGOaMr7Z78gxUdLMTz0wRrXJOD/SiPmfiSEQ3C2g+bekJ96z LHQcjig3SdvhFz0LK9Ouay+sqePa5xesJhc= Received: from mail.thefacebook.com (mailout.thefacebook.com [199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2vp5k0g0d6-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Thu, 17 Oct 2019 17:28:37 -0700 Received: from 2401:db00:30:6012:face:0:17:0 (2620:10d:c081:10::13) by mail.thefacebook.com (2620:10d:c081:35::130) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.1.1713.5; Thu, 17 Oct 2019 17:28:36 -0700 Received: by devvm2643.prn2.facebook.com (Postfix, from userid 111017) id 2382218CE8493; Thu, 17 Oct 2019 17:28:34 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm2643.prn2.facebook.com To: CC: Michal Hocko , Johannes Weiner , , , Shakeel Butt , Vladimir Davydov , Waiman Long , Christoph Lameter , Roman Gushchin Smtp-Origin-Cluster: prn2c23 Subject: [PATCH 16/16] mm: slab: remove redundant check in memcg_accumulate_slabinfo() Date: Thu, 17 Oct 2019 17:28:20 -0700 Message-ID: <20191018002820.307763-17-guro@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191018002820.307763-1-guro@fb.com> References: <20191018002820.307763-1-guro@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_07:2019-10-17,2019-10-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 spamscore=0 suspectscore=1 bulkscore=0 clxscore=1015 mlxscore=0 mlxlogscore=701 impostorscore=0 priorityscore=1501 phishscore=0 lowpriorityscore=0 malwarescore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910180001 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: memcg_accumulate_slabinfo() is never called with a non-root kmem_cache as a first argument, so the is_root_cache(s) check is redundant and can be removed without any functional change. Signed-off-by: Roman Gushchin --- mm/slab_common.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 3dcd90ba7525..1c10c94343f6 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1105,9 +1105,6 @@ memcg_accumulate_slabinfo(struct kmem_cache *s, struct slabinfo *info) struct kmem_cache *c; struct slabinfo sinfo; - if (!is_root_cache(s)) - return; - c = memcg_cache(s); if (c) { memset(&sinfo, 0, sizeof(sinfo));