From patchwork Mon Jun 8 23:08:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11594133 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B80D0138C for ; Mon, 8 Jun 2020 23:10:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 82ECC2100A for ; Mon, 8 Jun 2020 23:10:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="Jw4De9V5" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 82ECC2100A Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B41296B0008; Mon, 8 Jun 2020 19:10:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AF24C6B000D; Mon, 8 Jun 2020 19:10:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9E0EB6B000E; Mon, 8 Jun 2020 19:10:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0030.hostedemail.com [216.40.44.30]) by kanga.kvack.org (Postfix) with ESMTP id 80BF16B0008 for ; Mon, 8 Jun 2020 19:10:42 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 43006804930E for ; Mon, 8 Jun 2020 23:10:42 +0000 (UTC) X-FDA: 76907591124.25.talk73_3f1216e26dbd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 18608180889CF for ; Mon, 8 Jun 2020 23:10:42 +0000 (UTC) X-Spam-Summary: 2,0,0,724119ab807429cc,d41d8cd98f00b204,prvs=3428c3563a=guro@fb.com,,RULES_HIT:41:355:379:541:560:800:960:965:966:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1437:1513:1515:1516:1518:1521:1534:1541:1711:1730:1747:1777:1792:2196:2199:2393:2559:2562:2892:2895:3138:3139:3140:3141:3142:3353:3865:3866:3867:3868:3872:4321:4385:4390:4395:5007:6261:6653:8603:10004:10400:11026:11473:11658:11914:12043:12048:12296:12297:12438:12555:12895:13069:13311:13357:14096:14097:14181:14394:14721:21080:21450:21451:21627:21990:30034:30054:30064:30070:30075,0,RBL:67.231.153.30:@fb.com:.lbl8.mailshell.net-64.201.201.201 62.12.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: talk73_3f1216e26dbd X-Filterd-Recvd-Size: 4812 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Mon, 8 Jun 2020 23:10:41 +0000 (UTC) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 058NAZcS002290 for ; Mon, 8 Jun 2020 16:10:41 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=/V8jlYrtrAZx/4M+Yvb5XPaEETGmQNxopoGELAItQGo=; b=Jw4De9V5lQAH6SXRX3qCmtog7Yp81XFxmmwqONYtPoH9s/dAar/f0fPB4THw9mu+qlAA 3B8+Dxsxl8Z3R72gTpmpuvUzSnKm8yZU0UhR5Cc++EciYibzdTNgnHHY8O10MQtKp1C3 8Fe5NlQ+HUtguqxjuSWXZHFkGM1I/2qSTBM= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 31g8nkt76u-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 08 Jun 2020 16:10:40 -0700 Received: from intmgw002.06.prn3.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Mon, 8 Jun 2020 16:10:36 -0700 Received: by devvm1291.vll0.facebook.com (Postfix, from userid 111017) id 534991D8FFD4; Mon, 8 Jun 2020 16:08:28 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm1291.vll0.facebook.com To: Andrew Morton , Dennis Zhou , Tejun Heo , Christoph Lameter CC: Johannes Weiner , Michal Hocko , Shakeel Butt , , , , Roman Gushchin Smtp-Origin-Cluster: vll0c01 Subject: [PATCH v2 1/5] percpu: return number of released bytes from pcpu_free_area() Date: Mon, 8 Jun 2020 16:08:15 -0700 Message-ID: <20200608230819.832349-2-guro@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200608230819.832349-1-guro@fb.com> References: <20200608230819.832349-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216,18.0.687 definitions=2020-06-08_18:2020-06-08,2020-06-08 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 impostorscore=0 phishscore=0 clxscore=1015 mlxscore=0 priorityscore=1501 malwarescore=0 spamscore=0 mlxlogscore=999 suspectscore=2 cotscore=-2147483648 lowpriorityscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006080161 X-FB-Internal: deliver X-Rspamd-Queue-Id: 18608180889CF X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To implement accounting of percpu memory we need the information about the size of freed object. Return it from pcpu_free_area(). Signed-off-by: Roman Gushchin Acked-by: Dennis Zhou --- mm/percpu.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/mm/percpu.c b/mm/percpu.c index 696367b18222..aa36b78d45a6 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1211,11 +1211,14 @@ static int pcpu_alloc_area(struct pcpu_chunk *chunk, int alloc_bits, * * This function determines the size of an allocation to free using * the boundary bitmap and clears the allocation map. + * + * RETURNS: + * Number of freed bytes. */ -static void pcpu_free_area(struct pcpu_chunk *chunk, int off) +static int pcpu_free_area(struct pcpu_chunk *chunk, int off) { struct pcpu_block_md *chunk_md = &chunk->chunk_md; - int bit_off, bits, end, oslot; + int bit_off, bits, end, oslot, freed; lockdep_assert_held(&pcpu_lock); pcpu_stats_area_dealloc(chunk); @@ -1230,8 +1233,10 @@ static void pcpu_free_area(struct pcpu_chunk *chunk, int off) bits = end - bit_off; bitmap_clear(chunk->alloc_map, bit_off, bits); + freed = bits * PCPU_MIN_ALLOC_SIZE; + /* update metadata */ - chunk->free_bytes += bits * PCPU_MIN_ALLOC_SIZE; + chunk->free_bytes += freed; /* update first free bit */ chunk_md->first_free = min(chunk_md->first_free, bit_off); @@ -1239,6 +1244,8 @@ static void pcpu_free_area(struct pcpu_chunk *chunk, int off) pcpu_block_update_hint_free(chunk, bit_off, bits); pcpu_chunk_relocate(chunk, oslot); + + return freed; } static void pcpu_init_md_block(struct pcpu_block_md *block, int nr_bits) From patchwork Mon Jun 8 23:08:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11594141 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E59EA138C for ; Mon, 8 Jun 2020 23:10:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8CB9020E65 for ; Mon, 8 Jun 2020 23:10:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="CDjr76I5" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8CB9020E65 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 54FE36B0033; Mon, 8 Jun 2020 19:10:56 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4FDC76B0037; Mon, 8 Jun 2020 19:10:56 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 379566B0055; Mon, 8 Jun 2020 19:10:56 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0006.hostedemail.com [216.40.44.6]) by kanga.kvack.org (Postfix) with ESMTP id 1CB676B0033 for ; Mon, 8 Jun 2020 19:10:56 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id D2D52181ABEA7 for ; Mon, 8 Jun 2020 23:10:55 +0000 (UTC) X-FDA: 76907591670.13.soda43_160d95226dbd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 91C851813F3F1 for ; Mon, 8 Jun 2020 23:10:55 +0000 (UTC) X-Spam-Summary: 2,0,0,d0445241c9153806,d41d8cd98f00b204,prvs=3428c3563a=guro@fb.com,,RULES_HIT:41:327:355:379:541:800:960:966:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1431:1437:1513:1515:1516:1518:1521:1605:1730:1747:1777:1792:2196:2199:2393:2553:2559:2562:2892:2895:2899:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4250:4321:4385:4605:4641:5007:6119:6261:6653:7558:7875:7903:8603:8660:9010:9036:9121:9389:9592:10004:10394:11026:11232:11473:11658:11914:12043:12048:12291:12296:12297:12438:12555:12679:12683:12895:12986:13141:13148:13227:13229:13230:13869:14394:21080:21092:21324:21433:21450:21451:21611:21626:21660:21740:21789:21795:21939:21972:21990:30001:30005:30034:30045:30051:30054:30064:30075:30090,0,RBL:67.231.145.42:@fb.com:.lbl8.mailshell.net-62.12.0.100 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: soda43_160d95226dbd X-Filterd-Recvd-Size: 23762 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Mon, 8 Jun 2020 23:10:54 +0000 (UTC) Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 058NAk5R001266 for ; Mon, 8 Jun 2020 16:10:53 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=e0G/c0v1Tj6PICzAVyyql/fpF0z2VEFoZIUisXNx9wI=; b=CDjr76I5clDm+raQkuyTvi36uZh322gJr4gCxjcbaJ70ncq3ISHGfzooxbioCV53wS05 LFmqEp58mTl23MDlbUkbsHLEBA2Z24Va1LUYmZ9v/Pojnux67p0FbkC2ZscUECq8XWjG KczODzBPj76E/3kf13kWcj9M8P+rXeOMDdE= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 31gtucqnm1-13 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 08 Jun 2020 16:10:53 -0700 Received: from intmgw001.41.prn1.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:21d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Mon, 8 Jun 2020 16:10:39 -0700 Received: by devvm1291.vll0.facebook.com (Postfix, from userid 111017) id 58AD51D8FFD7; Mon, 8 Jun 2020 16:08:28 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm1291.vll0.facebook.com To: Andrew Morton , Dennis Zhou , Tejun Heo , Christoph Lameter CC: Johannes Weiner , Michal Hocko , Shakeel Butt , , , , Roman Gushchin Smtp-Origin-Cluster: vll0c01 Subject: [PATCH v2 2/5] mm: memcg/percpu: account percpu memory to memory cgroups Date: Mon, 8 Jun 2020 16:08:16 -0700 Message-ID: <20200608230819.832349-3-guro@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200608230819.832349-1-guro@fb.com> References: <20200608230819.832349-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216,18.0.687 definitions=2020-06-08_18:2020-06-08,2020-06-08 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 suspectscore=2 bulkscore=0 priorityscore=1501 cotscore=-2147483648 spamscore=0 lowpriorityscore=0 adultscore=0 malwarescore=0 mlxlogscore=999 clxscore=1015 phishscore=0 mlxscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006080161 X-FB-Internal: deliver X-Rspamd-Queue-Id: 91C851813F3F1 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Percpu memory is becoming more and more widely used by various subsystems, and the total amount of memory controlled by the percpu allocator can make a good part of the total memory. As an example, bpf maps can consume a lot of percpu memory, and they are created by a user. Also, some cgroup internals (e.g. memory controller statistics) can be quite large. On a machine with many CPUs and big number of cgroups they can consume hundreds of megabytes. So the lack of memcg accounting is creating a breach in the memory isolation. Similar to the slab memory, percpu memory should be accounted by default. To implement the perpcu accounting it's possible to take the slab memory accounting as a model to follow. Let's introduce two types of percpu chunks: root and memcg. What makes memcg chunks different is an additional space allocated to store memcg membership information. If __GFP_ACCOUNT is passed on allocation, a memcg chunk should be be used. If it's possible to charge the corresponding size to the target memory cgroup, allocation is performed, and the memcg ownership data is recorded. System-wide allocations are performed using root chunks, so there is no additional memory overhead. To implement a fast reparenting of percpu memory on memcg removal, we don't store mem_cgroup pointers directly: instead we use obj_cgroup API, introduced for slab accounting. Signed-off-by: Roman Gushchin Acked-by: Dennis Zhou --- mm/percpu-internal.h | 55 ++++++++++++- mm/percpu-km.c | 5 +- mm/percpu-stats.c | 36 +++++---- mm/percpu-vm.c | 5 +- mm/percpu.c | 183 ++++++++++++++++++++++++++++++++++++++----- 5 files changed, 244 insertions(+), 40 deletions(-) diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h index 0468ba500bd4..7983455842ff 100644 --- a/mm/percpu-internal.h +++ b/mm/percpu-internal.h @@ -5,6 +5,25 @@ #include #include +/* + * There are two chunk types: root and memcg-aware. + * Chunks of each type have separate slots list. + * + * Memcg-aware chunks have an attached vector of obj_cgroup pointers, which is + * used to store memcg membership data of a percpu object. Obj_cgroups are + * ref-counted pointers to a memory cgroup with an ability to switch dynamically + * to the parent memory cgroup. This allows to reclaim a deleted memory cgroup + * without reclaiming of all outstanding objects, which hold a reference at it. + */ +enum pcpu_chunk_type { + PCPU_CHUNK_ROOT, +#ifdef CONFIG_MEMCG_KMEM + PCPU_CHUNK_MEMCG, +#endif + PCPU_NR_CHUNK_TYPES, + PCPU_FAIL_ALLOC = PCPU_NR_CHUNK_TYPES +}; + /* * pcpu_block_md is the metadata block struct. * Each chunk's bitmap is split into a number of full blocks. @@ -54,6 +73,9 @@ struct pcpu_chunk { int end_offset; /* additional area required to have the region end page aligned */ +#ifdef CONFIG_MEMCG_KMEM + struct obj_cgroup **obj_cgroups; /* vector of object cgroups */ +#endif int nr_pages; /* # of pages served by this chunk */ int nr_populated; /* # of populated pages */ @@ -63,7 +85,7 @@ struct pcpu_chunk { extern spinlock_t pcpu_lock; -extern struct list_head *pcpu_slot; +extern struct list_head *pcpu_chunk_lists; extern int pcpu_nr_slots; extern int pcpu_nr_empty_pop_pages; @@ -106,6 +128,37 @@ static inline int pcpu_chunk_map_bits(struct pcpu_chunk *chunk) return pcpu_nr_pages_to_map_bits(chunk->nr_pages); } +#ifdef CONFIG_MEMCG_KMEM +static enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) +{ + if (chunk->obj_cgroups) + return PCPU_CHUNK_MEMCG; + return PCPU_CHUNK_ROOT; +} + +static bool pcpu_is_memcg_chunk(enum pcpu_chunk_type chunk_type) +{ + return chunk_type == PCPU_CHUNK_MEMCG; +} + +#else +static enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) +{ + return PCPU_CHUNK_ROOT; +} + +static bool pcpu_is_memcg_chunk(enum pcpu_chunk_type chunk_type) +{ + return false; +} +#endif + +static struct list_head *pcpu_chunk_list(enum pcpu_chunk_type chunk_type) +{ + return &pcpu_chunk_lists[pcpu_nr_slots * + pcpu_is_memcg_chunk(chunk_type)]; +} + #ifdef CONFIG_PERCPU_STATS #include diff --git a/mm/percpu-km.c b/mm/percpu-km.c index 20d2b69a13b0..35c9941077ee 100644 --- a/mm/percpu-km.c +++ b/mm/percpu-km.c @@ -44,7 +44,8 @@ static void pcpu_depopulate_chunk(struct pcpu_chunk *chunk, /* nada */ } -static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) +static struct pcpu_chunk *pcpu_create_chunk(enum pcpu_chunk_type type, + gfp_t gfp) { const int nr_pages = pcpu_group_sizes[0] >> PAGE_SHIFT; struct pcpu_chunk *chunk; @@ -52,7 +53,7 @@ static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) unsigned long flags; int i; - chunk = pcpu_alloc_chunk(gfp); + chunk = pcpu_alloc_chunk(type, gfp); if (!chunk) return NULL; diff --git a/mm/percpu-stats.c b/mm/percpu-stats.c index 32558063c3f9..c8400a2adbc2 100644 --- a/mm/percpu-stats.c +++ b/mm/percpu-stats.c @@ -34,11 +34,15 @@ static int find_max_nr_alloc(void) { struct pcpu_chunk *chunk; int slot, max_nr_alloc; + enum pcpu_chunk_type type; max_nr_alloc = 0; - for (slot = 0; slot < pcpu_nr_slots; slot++) - list_for_each_entry(chunk, &pcpu_slot[slot], list) - max_nr_alloc = max(max_nr_alloc, chunk->nr_alloc); + for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++) + for (slot = 0; slot < pcpu_nr_slots; slot++) + list_for_each_entry(chunk, &pcpu_chunk_list(type)[slot], + list) + max_nr_alloc = max(max_nr_alloc, + chunk->nr_alloc); return max_nr_alloc; } @@ -129,6 +133,9 @@ static void chunk_map_stats(struct seq_file *m, struct pcpu_chunk *chunk, P("cur_min_alloc", cur_min_alloc); P("cur_med_alloc", cur_med_alloc); P("cur_max_alloc", cur_max_alloc); +#ifdef CONFIG_MEMCG_KMEM + P("memcg_aware", pcpu_is_memcg_chunk(pcpu_chunk_type(chunk))); +#endif seq_putc(m, '\n'); } @@ -137,6 +144,7 @@ static int percpu_stats_show(struct seq_file *m, void *v) struct pcpu_chunk *chunk; int slot, max_nr_alloc; int *buffer; + enum pcpu_chunk_type type; alloc_buffer: spin_lock_irq(&pcpu_lock); @@ -202,18 +210,18 @@ static int percpu_stats_show(struct seq_file *m, void *v) chunk_map_stats(m, pcpu_reserved_chunk, buffer); } - for (slot = 0; slot < pcpu_nr_slots; slot++) { - list_for_each_entry(chunk, &pcpu_slot[slot], list) { - if (chunk == pcpu_first_chunk) { - seq_puts(m, "Chunk: <- First Chunk\n"); - chunk_map_stats(m, chunk, buffer); - - - } else { - seq_puts(m, "Chunk:\n"); - chunk_map_stats(m, chunk, buffer); + for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++) { + for (slot = 0; slot < pcpu_nr_slots; slot++) { + list_for_each_entry(chunk, &pcpu_chunk_list(type)[slot], + list) { + if (chunk == pcpu_first_chunk) { + seq_puts(m, "Chunk: <- First Chunk\n"); + chunk_map_stats(m, chunk, buffer); + } else { + seq_puts(m, "Chunk:\n"); + chunk_map_stats(m, chunk, buffer); + } } - } } diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c index a2b395acef89..e46f7a6917f9 100644 --- a/mm/percpu-vm.c +++ b/mm/percpu-vm.c @@ -328,12 +328,13 @@ static void pcpu_depopulate_chunk(struct pcpu_chunk *chunk, pcpu_free_pages(chunk, pages, page_start, page_end); } -static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) +static struct pcpu_chunk *pcpu_create_chunk(enum pcpu_chunk_type type, + gfp_t gfp) { struct pcpu_chunk *chunk; struct vm_struct **vms; - chunk = pcpu_alloc_chunk(gfp); + chunk = pcpu_alloc_chunk(type, gfp); if (!chunk) return NULL; diff --git a/mm/percpu.c b/mm/percpu.c index aa36b78d45a6..8ebd9fe30430 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -37,9 +37,14 @@ * takes care of normal allocations. * * The allocator organizes chunks into lists according to free size and - * tries to allocate from the fullest chunk first. Each chunk is managed - * by a bitmap with metadata blocks. The allocation map is updated on - * every allocation and free to reflect the current state while the boundary + * memcg-awareness. To make a percpu allocation memcg-aware the __GFP_ACCOUNT + * flag should be passed. All memcg-aware allocations are sharing one set + * of chunks and all unaccounted allocations and allocations performed + * by processes belonging to the root memory cgroup are using the second set. + * + * The allocator tries to allocate from the fullest chunk first. Each chunk + * is managed by a bitmap with metadata blocks. The allocation map is updated + * on every allocation and free to reflect the current state while the boundary * map is only updated on allocation. Each metadata block contains * information to help mitigate the need to iterate over large portions * of the bitmap. The reverse mapping from page to chunk is stored in @@ -81,6 +86,7 @@ #include #include #include +#include #include #include @@ -160,7 +166,7 @@ struct pcpu_chunk *pcpu_reserved_chunk __ro_after_init; DEFINE_SPINLOCK(pcpu_lock); /* all internal data structures */ static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop, map ext */ -struct list_head *pcpu_slot __ro_after_init; /* chunk list slots */ +struct list_head *pcpu_chunk_lists __ro_after_init; /* chunk list slots */ /* chunks which need their map areas extended, protected by pcpu_lock */ static LIST_HEAD(pcpu_map_extend_chunks); @@ -500,6 +506,9 @@ static void __pcpu_chunk_move(struct pcpu_chunk *chunk, int slot, bool move_front) { if (chunk != pcpu_reserved_chunk) { + struct list_head *pcpu_slot; + + pcpu_slot = pcpu_chunk_list(pcpu_chunk_type(chunk)); if (move_front) list_move(&chunk->list, &pcpu_slot[slot]); else @@ -1341,6 +1350,10 @@ static struct pcpu_chunk * __init pcpu_alloc_first_chunk(unsigned long tmp_addr, panic("%s: Failed to allocate %zu bytes\n", __func__, alloc_size); +#ifdef CONFIG_MEMCG_KMEM + /* first chunk isn't memcg-aware */ + chunk->obj_cgroups = NULL; +#endif pcpu_init_md_blocks(chunk); /* manage populated page bitmap */ @@ -1380,7 +1393,7 @@ static struct pcpu_chunk * __init pcpu_alloc_first_chunk(unsigned long tmp_addr, return chunk; } -static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp) +static struct pcpu_chunk *pcpu_alloc_chunk(enum pcpu_chunk_type type, gfp_t gfp) { struct pcpu_chunk *chunk; int region_bits; @@ -1408,6 +1421,16 @@ static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp) if (!chunk->md_blocks) goto md_blocks_fail; +#ifdef CONFIG_MEMCG_KMEM + if (pcpu_is_memcg_chunk(type)) { + chunk->obj_cgroups = + pcpu_mem_zalloc(pcpu_chunk_map_bits(chunk) * + sizeof(struct obj_cgroup *), gfp); + if (!chunk->obj_cgroups) + goto objcg_fail; + } +#endif + pcpu_init_md_blocks(chunk); /* init metadata */ @@ -1415,6 +1438,8 @@ static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp) return chunk; +objcg_fail: + pcpu_mem_free(chunk->md_blocks); md_blocks_fail: pcpu_mem_free(chunk->bound_map); bound_map_fail: @@ -1429,6 +1454,9 @@ static void pcpu_free_chunk(struct pcpu_chunk *chunk) { if (!chunk) return; +#ifdef CONFIG_MEMCG_KMEM + pcpu_mem_free(chunk->obj_cgroups); +#endif pcpu_mem_free(chunk->md_blocks); pcpu_mem_free(chunk->bound_map); pcpu_mem_free(chunk->alloc_map); @@ -1505,7 +1533,8 @@ static int pcpu_populate_chunk(struct pcpu_chunk *chunk, int page_start, int page_end, gfp_t gfp); static void pcpu_depopulate_chunk(struct pcpu_chunk *chunk, int page_start, int page_end); -static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp); +static struct pcpu_chunk *pcpu_create_chunk(enum pcpu_chunk_type type, + gfp_t gfp); static void pcpu_destroy_chunk(struct pcpu_chunk *chunk); static struct page *pcpu_addr_to_page(void *addr); static int __init pcpu_verify_alloc_info(const struct pcpu_alloc_info *ai); @@ -1547,6 +1576,77 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(void *addr) return pcpu_get_page_chunk(pcpu_addr_to_page(addr)); } +#ifdef CONFIG_MEMCG_KMEM +static enum pcpu_chunk_type pcpu_memcg_pre_alloc_hook(size_t size, gfp_t gfp, + struct obj_cgroup **objcgp) +{ + struct obj_cgroup *objcg; + + if (!memcg_kmem_enabled() || !(gfp & __GFP_ACCOUNT) || + memcg_kmem_bypass()) + return PCPU_CHUNK_ROOT; + + objcg = get_obj_cgroup_from_current(); + if (!objcg) + return PCPU_CHUNK_ROOT; + + if (obj_cgroup_charge(objcg, gfp, size * num_possible_cpus())) { + obj_cgroup_put(objcg); + return PCPU_FAIL_ALLOC; + } + + *objcgp = objcg; + return PCPU_CHUNK_MEMCG; +} + +static void pcpu_memcg_post_alloc_hook(struct obj_cgroup *objcg, + struct pcpu_chunk *chunk, int off, + size_t size) +{ + if (!objcg) + return; + + if (chunk) { + chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = objcg; + } else { + obj_cgroup_uncharge(objcg, size * num_possible_cpus()); + obj_cgroup_put(objcg); + } +} + +static void pcpu_memcg_free_hook(struct pcpu_chunk *chunk, int off, size_t size) +{ + struct obj_cgroup *objcg; + + if (!pcpu_is_memcg_chunk(pcpu_chunk_type(chunk))) + return; + + objcg = chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT]; + chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = NULL; + + obj_cgroup_uncharge(objcg, size * num_possible_cpus()); + + obj_cgroup_put(objcg); +} + +#else /* CONFIG_MEMCG_KMEM */ +static enum pcpu_chunk_type pcpu_memcg_pre_alloc_hook(size_t size, gfp_t gfp, + struct mem_cgroup **memcgp) +{ + return PCPU_CHUNK_ROOT; +} + +static void pcpu_memcg_post_alloc_hook(struct mem_cgroup *memcg, + struct pcpu_chunk *chunk, int off, + size_t size) +{ +} + +static void pcpu_memcg_free_hook(struct pcpu_chunk *chunk, int off, size_t size) +{ +} +#endif /* CONFIG_MEMCG_KMEM */ + /** * pcpu_alloc - the percpu allocator * @size: size of area to allocate in bytes @@ -1568,6 +1668,9 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, gfp_t pcpu_gfp; bool is_atomic; bool do_warn; + enum pcpu_chunk_type type; + struct list_head *pcpu_slot; + struct obj_cgroup *objcg = NULL; static int warn_limit = 10; struct pcpu_chunk *chunk, *next; const char *err; @@ -1602,16 +1705,23 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, return NULL; } + type = pcpu_memcg_pre_alloc_hook(size, gfp, &objcg); + if (unlikely(type == PCPU_FAIL_ALLOC)) + return NULL; + pcpu_slot = pcpu_chunk_list(type); + if (!is_atomic) { /* * pcpu_balance_workfn() allocates memory under this mutex, * and it may wait for memory reclaim. Allow current task * to become OOM victim, in case of memory pressure. */ - if (gfp & __GFP_NOFAIL) + if (gfp & __GFP_NOFAIL) { mutex_lock(&pcpu_alloc_mutex); - else if (mutex_lock_killable(&pcpu_alloc_mutex)) + } else if (mutex_lock_killable(&pcpu_alloc_mutex)) { + pcpu_memcg_post_alloc_hook(objcg, NULL, 0, size); return NULL; + } } spin_lock_irqsave(&pcpu_lock, flags); @@ -1666,7 +1776,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, } if (list_empty(&pcpu_slot[pcpu_nr_slots - 1])) { - chunk = pcpu_create_chunk(pcpu_gfp); + chunk = pcpu_create_chunk(type, pcpu_gfp); if (!chunk) { err = "failed to allocate new chunk"; goto fail; @@ -1723,6 +1833,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, trace_percpu_alloc_percpu(reserved, is_atomic, size, align, chunk->base_addr, off, ptr); + pcpu_memcg_post_alloc_hook(objcg, chunk, off, size); + return ptr; fail_unlock: @@ -1744,6 +1856,9 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, } else { mutex_unlock(&pcpu_alloc_mutex); } + + pcpu_memcg_post_alloc_hook(objcg, NULL, 0, size); + return NULL; } @@ -1803,8 +1918,8 @@ void __percpu *__alloc_reserved_percpu(size_t size, size_t align) } /** - * pcpu_balance_workfn - manage the amount of free chunks and populated pages - * @work: unused + * __pcpu_balance_workfn - manage the amount of free chunks and populated pages + * @type: chunk type * * Reclaim all fully free chunks except for the first one. This is also * responsible for maintaining the pool of empty populated pages. However, @@ -1813,11 +1928,12 @@ void __percpu *__alloc_reserved_percpu(size_t size, size_t align) * allocation causes the failure as it is possible that requests can be * serviced from already backed regions. */ -static void pcpu_balance_workfn(struct work_struct *work) +static void __pcpu_balance_workfn(enum pcpu_chunk_type type) { /* gfp flags passed to underlying allocators */ const gfp_t gfp = GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; LIST_HEAD(to_free); + struct list_head *pcpu_slot = pcpu_chunk_list(type); struct list_head *free_head = &pcpu_slot[pcpu_nr_slots - 1]; struct pcpu_chunk *chunk, *next; int slot, nr_to_pop, ret; @@ -1915,7 +2031,7 @@ static void pcpu_balance_workfn(struct work_struct *work) if (nr_to_pop) { /* ran out of chunks to populate, create a new one and retry */ - chunk = pcpu_create_chunk(gfp); + chunk = pcpu_create_chunk(type, gfp); if (chunk) { spin_lock_irq(&pcpu_lock); pcpu_chunk_relocate(chunk, -1); @@ -1927,6 +2043,20 @@ static void pcpu_balance_workfn(struct work_struct *work) mutex_unlock(&pcpu_alloc_mutex); } +/** + * pcpu_balance_workfn - manage the amount of free chunks and populated pages + * @work: unused + * + * Call __pcpu_balance_workfn() for each chunk type. + */ +static void pcpu_balance_workfn(struct work_struct *work) +{ + enum pcpu_chunk_type type; + + for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++) + __pcpu_balance_workfn(type); +} + /** * free_percpu - free percpu area * @ptr: pointer to area to free @@ -1941,8 +2071,9 @@ void free_percpu(void __percpu *ptr) void *addr; struct pcpu_chunk *chunk; unsigned long flags; - int off; + int size, off; bool need_balance = false; + struct list_head *pcpu_slot; if (!ptr) return; @@ -1956,7 +2087,11 @@ void free_percpu(void __percpu *ptr) chunk = pcpu_chunk_addr_search(addr); off = addr - chunk->base_addr; - pcpu_free_area(chunk, off); + size = pcpu_free_area(chunk, off); + + pcpu_slot = pcpu_chunk_list(pcpu_chunk_type(chunk)); + + pcpu_memcg_free_hook(chunk, off, size); /* if there are more than one fully free chunks, wake up grim reaper */ if (chunk->free_bytes == pcpu_unit_size) { @@ -2267,6 +2402,7 @@ void __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai, int map_size; unsigned long tmp_addr; size_t alloc_size; + enum pcpu_chunk_type type; #define PCPU_SETUP_BUG_ON(cond) do { \ if (unlikely(cond)) { \ @@ -2384,13 +2520,18 @@ void __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai, * empty chunks. */ pcpu_nr_slots = __pcpu_size_to_slot(pcpu_unit_size) + 2; - pcpu_slot = memblock_alloc(pcpu_nr_slots * sizeof(pcpu_slot[0]), - SMP_CACHE_BYTES); - if (!pcpu_slot) + pcpu_chunk_lists = memblock_alloc(pcpu_nr_slots * + sizeof(pcpu_chunk_lists[0]) * + PCPU_NR_CHUNK_TYPES, + SMP_CACHE_BYTES); + if (!pcpu_chunk_lists) panic("%s: Failed to allocate %zu bytes\n", __func__, - pcpu_nr_slots * sizeof(pcpu_slot[0])); - for (i = 0; i < pcpu_nr_slots; i++) - INIT_LIST_HEAD(&pcpu_slot[i]); + pcpu_nr_slots * sizeof(pcpu_chunk_lists[0]) * + PCPU_NR_CHUNK_TYPES); + + for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++) + for (i = 0; i < pcpu_nr_slots; i++) + INIT_LIST_HEAD(&pcpu_chunk_list(type)[i]); /* * The end of the static region needs to be aligned with the From patchwork Mon Jun 8 23:08:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11594139 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C9869138C for ; Mon, 8 Jun 2020 23:10:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8B8B22100A for ; Mon, 8 Jun 2020 23:10:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="SB4oHBSF" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8B8B22100A Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BB5F26B0030; Mon, 8 Jun 2020 19:10:53 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B66F46B0033; Mon, 8 Jun 2020 19:10:53 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7C496B0036; Mon, 8 Jun 2020 19:10:53 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0216.hostedemail.com [216.40.44.216]) by kanga.kvack.org (Postfix) with ESMTP id 907286B0030 for ; Mon, 8 Jun 2020 19:10:53 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 64AEA73BA4 for ; Mon, 8 Jun 2020 23:10:53 +0000 (UTC) X-FDA: 76907591586.14.unit16_140c2f326dbd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 4136B18125EF8 for ; Mon, 8 Jun 2020 23:10:53 +0000 (UTC) X-Spam-Summary: 2,0,0,1bb3e277fb7e4969,d41d8cd98f00b204,prvs=3428c3563a=guro@fb.com,,RULES_HIT:41:355:379:541:800:960:966:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1437:1513:1515:1516:1518:1521:1535:1543:1711:1730:1747:1777:1792:1801:2196:2199:2393:2559:2562:3138:3139:3140:3141:3142:3308:3354:3865:3866:3868:3870:3871:3872:4117:4321:4385:4605:5007:6119:6261:6653:7903:8603:9036:10004:10400:10450:10455:11026:11232:11473:11658:11914:12043:12048:12296:12297:12438:12555:12895:13869:14181:14394:14721:19904:19999:21067:21080:21451:21627:21740:21966:21990:30005:30054:30064:30070,0,RBL:67.231.153.30:@fb.com:.lbl8.mailshell.net-64.201.201.201 62.12.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: unit16_140c2f326dbd X-Filterd-Recvd-Size: 6870 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Mon, 8 Jun 2020 23:10:52 +0000 (UTC) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.16.0.42/8.16.0.42) with SMTP id 058N7nsg011966 for ; Mon, 8 Jun 2020 16:10:52 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=uFU6bjUR37gcL0hQCfN05PwjTlojkmj4P0HtMNmth3E=; b=SB4oHBSFJ9Nwydl/71TYU44XeXwfhFmgaXCfaHlZ4A+dwvbp8L/10PFCG2fO6Ry+hN63 KjDmYo0HgRXVRTc7Q/Cms1uKM52BDOltXoUsg82YhwmEDtuTy9sRki7WdV/9OO0+qs+O dUV1vnlffs7RQIOKgB64aNDuUD5d3N98V+U= Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net with ESMTP id 31g6tkjfh0-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 08 Jun 2020 16:10:52 -0700 Received: from intmgw002.41.prn1.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Mon, 8 Jun 2020 16:10:48 -0700 Received: by devvm1291.vll0.facebook.com (Postfix, from userid 111017) id 5E16B1D8FFDA; Mon, 8 Jun 2020 16:08:28 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm1291.vll0.facebook.com To: Andrew Morton , Dennis Zhou , Tejun Heo , Christoph Lameter CC: Johannes Weiner , Michal Hocko , Shakeel Butt , , , , Roman Gushchin Smtp-Origin-Cluster: vll0c01 Subject: [PATCH v2 3/5] mm: memcg/percpu: per-memcg percpu memory statistics Date: Mon, 8 Jun 2020 16:08:17 -0700 Message-ID: <20200608230819.832349-4-guro@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200608230819.832349-1-guro@fb.com> References: <20200608230819.832349-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216,18.0.687 definitions=2020-06-08_18:2020-06-08,2020-06-08 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 malwarescore=0 phishscore=0 spamscore=0 priorityscore=1501 mlxlogscore=999 impostorscore=0 bulkscore=0 clxscore=1015 cotscore=-2147483648 adultscore=0 lowpriorityscore=0 suspectscore=2 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006080161 X-FB-Internal: deliver X-Rspamd-Queue-Id: 4136B18125EF8 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Percpu memory can represent a noticeable chunk of the total memory consumption, especially on big machines with many CPUs. Let's track percpu memory usage for each memcg and display it in memory.stat. A percpu allocation is usually scattered over multiple pages (and nodes), and can be significantly smaller than a page. So let's add a byte-sized counter on the memcg level: MEMCG_PERCPU_B. Byte-sized vmstat infra created for slabs can be perfectly reused for percpu case. Signed-off-by: Roman Gushchin Acked-by: Dennis Zhou --- Documentation/admin-guide/cgroup-v2.rst | 4 ++++ include/linux/memcontrol.h | 8 ++++++++ mm/memcontrol.c | 4 +++- mm/percpu.c | 10 ++++++++++ 4 files changed, 25 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index ce3e05e41724..7c1e784239bf 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1274,6 +1274,10 @@ PAGE_SIZE multiple when read back. Amount of memory used for storing in-kernel data structures. + percpu + Amount of memory used for storing per-cpu kernel + data structures. + sock Amount of memory used in network transmission buffers diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index eede46c43573..7ed3af71a6fb 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -32,11 +32,19 @@ struct kmem_cache; enum memcg_stat_item { MEMCG_SWAP = NR_VM_NODE_STAT_ITEMS, MEMCG_SOCK, + MEMCG_PERCPU_B, /* XXX: why are these zone and not node counters? */ MEMCG_KERNEL_STACK_KB, MEMCG_NR_STAT, }; +static __always_inline bool memcg_stat_item_in_bytes(enum memcg_stat_item item) +{ + if (item == MEMCG_PERCPU_B) + return true; + return vmstat_item_in_bytes(item); +} + enum memcg_memory_event { MEMCG_LOW, MEMCG_HIGH, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 93b2e73ef2f7..839b4d890a90 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -783,7 +783,7 @@ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val) if (mem_cgroup_disabled()) return; - if (vmstat_item_in_bytes(idx)) + if (memcg_stat_item_in_bytes(idx)) threshold <<= PAGE_SHIFT; x = val + __this_cpu_read(memcg->vmstats_percpu->stat[idx]); @@ -1490,6 +1490,8 @@ static char *memory_stat_format(struct mem_cgroup *memcg) seq_buf_printf(&s, "slab %llu\n", (u64)(memcg_page_state(memcg, NR_SLAB_RECLAIMABLE_B) + memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE_B))); + seq_buf_printf(&s, "percpu %llu\n", + (u64)memcg_page_state(memcg, MEMCG_PERCPU_B)); seq_buf_printf(&s, "sock %llu\n", (u64)memcg_page_state(memcg, MEMCG_SOCK) * PAGE_SIZE); diff --git a/mm/percpu.c b/mm/percpu.c index 8ebd9fe30430..18d3d049bf91 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1608,6 +1608,11 @@ static void pcpu_memcg_post_alloc_hook(struct obj_cgroup *objcg, if (chunk) { chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = objcg; + + rcu_read_lock(); + mod_memcg_state(obj_cgroup_memcg(objcg), MEMCG_PERCPU_B, + size * num_possible_cpus()); + rcu_read_unlock(); } else { obj_cgroup_uncharge(objcg, size * num_possible_cpus()); obj_cgroup_put(objcg); @@ -1626,6 +1631,11 @@ static void pcpu_memcg_free_hook(struct pcpu_chunk *chunk, int off, size_t size) obj_cgroup_uncharge(objcg, size * num_possible_cpus()); + rcu_read_lock(); + mod_memcg_state(obj_cgroup_memcg(objcg), MEMCG_PERCPU_B, + -(size * num_possible_cpus())); + rcu_read_unlock(); + obj_cgroup_put(objcg); } From patchwork Mon Jun 8 23:08:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11594131 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 95C9C138C for ; Mon, 8 Jun 2020 23:10:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 618DC208FE for ; Mon, 8 Jun 2020 23:10:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="PpoyYKrs" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 618DC208FE Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 80D266B0006; Mon, 8 Jun 2020 19:10:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7BDF36B0008; Mon, 8 Jun 2020 19:10:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D3276B000A; Mon, 8 Jun 2020 19:10:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0064.hostedemail.com [216.40.44.64]) by kanga.kvack.org (Postfix) with ESMTP id 54ABB6B0006 for ; Mon, 8 Jun 2020 19:10:39 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 13F3F8049CE0 for ; Mon, 8 Jun 2020 23:10:39 +0000 (UTC) X-FDA: 76907590998.07.knot44_601123226dbd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id C821218044FFC for ; Mon, 8 Jun 2020 23:10:38 +0000 (UTC) X-Spam-Summary: 2,0,0,7c5bcdbb438117e7,d41d8cd98f00b204,prvs=3428c3563a=guro@fb.com,,RULES_HIT:41:355:379:541:800:960:966:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1437:1513:1515:1516:1518:1521:1535:1542:1711:1730:1747:1777:1792:2196:2199:2393:2559:2562:2890:3138:3139:3140:3141:3142:3353:3865:3866:3867:3870:3871:3872:4042:4321:4385:4605:5007:6261:6653:8603:10004:10400:11026:11233:11473:11658:11914:12043:12048:12296:12297:12438:12555:12895:14096:14097:14181:14394:14721:21080:21450:21451:21611:21627:21990:30001:30054:30064,0,RBL:67.231.145.42:@fb.com:.lbl8.mailshell.net-62.12.0.100 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:19,LUA_SUMMARY:none X-HE-Tag: knot44_601123226dbd X-Filterd-Recvd-Size: 5348 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Mon, 8 Jun 2020 23:10:38 +0000 (UTC) Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 058N9p8e022733 for ; Mon, 8 Jun 2020 16:10:37 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=CB/q5bKfCzvsHMrayyZPYgLERGZmeBVcBHwEE+uB5a4=; b=PpoyYKrs8L39JN+EMoMpTTy8ot1jhPvmJ5yfIQENRoz74GVtFx3qXO35WDE+snq28sOV IF3YiBmsyviWTE1fz1eLJ3z6RAqb9wxZ/EhyEjbutuLFNyBZcsMoztYbTimV8/rMYEAC K6iQc7PWgqLtLt8sOSsK5JZVTp+5T3gxB+w= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 31gu497p35-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 08 Jun 2020 16:10:37 -0700 Received: from intmgw002.06.prn3.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Mon, 8 Jun 2020 16:10:36 -0700 Received: by devvm1291.vll0.facebook.com (Postfix, from userid 111017) id 6305D1D8FFE0; Mon, 8 Jun 2020 16:08:28 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm1291.vll0.facebook.com To: Andrew Morton , Dennis Zhou , Tejun Heo , Christoph Lameter CC: Johannes Weiner , Michal Hocko , Shakeel Butt , , , , Roman Gushchin Smtp-Origin-Cluster: vll0c01 Subject: [PATCH v2 4/5] mm: memcg: charge memcg percpu memory to the parent cgroup Date: Mon, 8 Jun 2020 16:08:18 -0700 Message-ID: <20200608230819.832349-5-guro@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200608230819.832349-1-guro@fb.com> References: <20200608230819.832349-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216,18.0.687 definitions=2020-06-08_18:2020-06-08,2020-06-08 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 clxscore=1015 suspectscore=2 malwarescore=0 phishscore=0 mlxlogscore=666 cotscore=-2147483648 impostorscore=0 priorityscore=1501 lowpriorityscore=0 bulkscore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006080161 X-FB-Internal: deliver X-Rspamd-Queue-Id: C821218044FFC X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Memory cgroups are using large chunks of percpu memory to store vmstat data. Yet this memory is not accounted at all, so in the case when there are many (dying) cgroups, it's not exactly clear where all the memory is. Because the size of memory cgroup internal structures can dramatically exceed the size of object or page which is pinning it in the memory, it's not a good idea to simple ignore it. It actually breaks the isolation between cgroups. Let's account the consumed percpu memory to the parent cgroup. Signed-off-by: Roman Gushchin Acked-by: Dennis Zhou --- mm/memcontrol.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 839b4d890a90..638ad499a9e0 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5069,13 +5069,15 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) if (!pn) return 1; - pn->lruvec_stat_local = alloc_percpu(struct lruvec_stat); + pn->lruvec_stat_local = alloc_percpu_gfp(struct lruvec_stat, + GFP_KERNEL_ACCOUNT); if (!pn->lruvec_stat_local) { kfree(pn); return 1; } - pn->lruvec_stat_cpu = alloc_percpu(struct lruvec_stat); + pn->lruvec_stat_cpu = alloc_percpu_gfp(struct lruvec_stat, + GFP_KERNEL_ACCOUNT); if (!pn->lruvec_stat_cpu) { free_percpu(pn->lruvec_stat_local); kfree(pn); @@ -5149,11 +5151,13 @@ static struct mem_cgroup *mem_cgroup_alloc(void) goto fail; } - memcg->vmstats_local = alloc_percpu(struct memcg_vmstats_percpu); + memcg->vmstats_local = alloc_percpu_gfp(struct memcg_vmstats_percpu, + GFP_KERNEL_ACCOUNT); if (!memcg->vmstats_local) goto fail; - memcg->vmstats_percpu = alloc_percpu(struct memcg_vmstats_percpu); + memcg->vmstats_percpu = alloc_percpu_gfp(struct memcg_vmstats_percpu, + GFP_KERNEL_ACCOUNT); if (!memcg->vmstats_percpu) goto fail; @@ -5202,7 +5206,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) struct mem_cgroup *memcg; long error = -ENOMEM; + memalloc_use_memcg(parent); memcg = mem_cgroup_alloc(); + memalloc_unuse_memcg(); if (IS_ERR(memcg)) return ERR_CAST(memcg); From patchwork Mon Jun 8 23:08:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11594135 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5D51A138C for ; Mon, 8 Jun 2020 23:10:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2A42B20E65 for ; Mon, 8 Jun 2020 23:10:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="L5mIhXGQ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2A42B20E65 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 43E086B0028; Mon, 8 Jun 2020 19:10:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3F7086B0030; Mon, 8 Jun 2020 19:10:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 28D066B0030; Mon, 8 Jun 2020 19:10:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0087.hostedemail.com [216.40.44.87]) by kanga.kvack.org (Postfix) with ESMTP id 0E8326B000D for ; Mon, 8 Jun 2020 19:10:45 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id C5331180889D9 for ; Mon, 8 Jun 2020 23:10:44 +0000 (UTC) X-FDA: 76907591208.18.brass55_0f0339e26dbd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 7C4C51009D598 for ; Mon, 8 Jun 2020 23:10:44 +0000 (UTC) X-Spam-Summary: 2,0,0,a989d10636ef1dd8,d41d8cd98f00b204,prvs=3428c3563a=guro@fb.com,,RULES_HIT:41:355:379:541:800:960:966:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1437:1513:1515:1516:1518:1521:1535:1543:1711:1730:1747:1777:1792:2196:2199:2393:2559:2562:3138:3139:3140:3141:3142:3353:3865:3866:3867:3868:3871:3872:3874:4117:4321:4385:4605:5007:6261:6653:7875:7903:10004:10400:11026:11473:11657:11658:11914:12043:12048:12291:12296:12297:12438:12555:12895:13161:13229:14181:14394:14721:21080:21450:21451:21627:21939:21990:30034:30054:30056:30064:30070,0,RBL:67.231.153.30:@fb.com:.lbl8.mailshell.net-64.201.201.201 62.12.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:35,LUA_SUMMARY:none X-HE-Tag: brass55_0f0339e26dbd X-Filterd-Recvd-Size: 6131 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Mon, 8 Jun 2020 23:10:43 +0000 (UTC) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.16.0.42/8.16.0.42) with SMTP id 058N9Do5024616 for ; Mon, 8 Jun 2020 16:10:43 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=xEe8RooAvUZzoIKR3xf43emyMNxqJTBN+IfhERzWUlE=; b=L5mIhXGQ0EZbF4rOB265Q4AaK2NYgzfkb6YZSf/HlAFJvI9EBud6nj2ZLsSZYUS+VaJT USCVnLm3K6SwCnSIV0FRuy6X9px5VGWTz4mi809Gmv850f9VSV1BxDyho2tZjLkJlcQa SFpXKRxosFmUaE3kJJJyCJm0+4hwNkX2+6g= Received: from maileast.thefacebook.com ([163.114.130.16]) by m0089730.ppops.net with ESMTP id 31g6mx2fdy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 08 Jun 2020 16:10:43 -0700 Received: from intmgw004.06.prn3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::d) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Mon, 8 Jun 2020 16:10:41 -0700 Received: by devvm1291.vll0.facebook.com (Postfix, from userid 111017) id 67C471D8FFE3; Mon, 8 Jun 2020 16:08:28 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm1291.vll0.facebook.com To: Andrew Morton , Dennis Zhou , Tejun Heo , Christoph Lameter CC: Johannes Weiner , Michal Hocko , Shakeel Butt , , , , Roman Gushchin Smtp-Origin-Cluster: vll0c01 Subject: [PATCH v2 5/5] kselftests: cgroup: add perpcu memory accounting test Date: Mon, 8 Jun 2020 16:08:19 -0700 Message-ID: <20200608230819.832349-6-guro@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200608230819.832349-1-guro@fb.com> References: <20200608230819.832349-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216,18.0.687 definitions=2020-06-08_18:2020-06-08,2020-06-08 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 priorityscore=1501 suspectscore=2 impostorscore=0 adultscore=0 phishscore=0 cotscore=-2147483648 mlxlogscore=999 lowpriorityscore=0 bulkscore=0 spamscore=0 mlxscore=0 malwarescore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006080161 X-FB-Internal: deliver X-Rspamd-Queue-Id: 7C4C51009D598 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a simple test to check the percpu memory accounting. The test creates a cgroup tree with 1000 child cgroups and checks values of memory.current and memory.stat::percpu. Signed-off-by: Roman Gushchin --- tools/testing/selftests/cgroup/test_kmem.c | 70 +++++++++++++++++++++- 1 file changed, 69 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c index 5224dae216e5..0941aa16157e 100644 --- a/tools/testing/selftests/cgroup/test_kmem.c +++ b/tools/testing/selftests/cgroup/test_kmem.c @@ -18,6 +18,15 @@ #include "cgroup_util.h" +/* + * Memory cgroup charging and vmstat data aggregation is performed using + * percpu batches 32 pages big (look at MEMCG_CHARGE_BATCH). So the maximum + * discrepancy between charge and vmstat entries is number of cpus multiplied + * by 32 pages multiplied by 2. + */ +#define MAX_VMSTAT_ERROR (4096 * 32 * 2 * get_nprocs()) + + static int alloc_dcache(const char *cgroup, void *arg) { unsigned long i; @@ -180,7 +189,7 @@ static int test_kmem_memcg_deletion(const char *root) goto cleanup; sum = slab + anon + file + kernel_stack; - if (abs(sum - current) < 4096 * 32 * 2 * get_nprocs()) { + if (abs(sum - current) < MAX_VMSTAT_ERROR) { ret = KSFT_PASS; } else { printf("memory.current = %ld\n", current); @@ -331,6 +340,64 @@ static int test_kmem_dead_cgroups(const char *root) return ret; } +/* + * This test creates a sub-tree with 1000 memory cgroups. + * Then it checks that the memory.current on the parent level + * is greater than 0 and approximates matches the percpu value + * from memory.stat. + */ +static int test_percpu_basic(const char *root) +{ + int ret = KSFT_FAIL; + char *parent, *child; + long current, percpu; + int i; + + parent = cg_name(root, "percpu_basic_test"); + if (!parent) + goto cleanup; + + if (cg_create(parent)) + goto cleanup; + + if (cg_write(parent, "cgroup.subtree_control", "+memory")) + goto cleanup; + + for (i = 0; i < 1000; i++) { + child = cg_name_indexed(parent, "child", i); + if (!child) + return -1; + + if (cg_create(child)) + goto cleanup_children; + + free(child); + } + + current = cg_read_long(parent, "memory.current"); + percpu = cg_read_key_long(parent, "memory.stat", "percpu "); + + if (current > 0 && percpu > 0 && abs(current - percpu) < + MAX_VMSTAT_ERROR) + ret = KSFT_PASS; + else + printf("memory.current %ld\npercpu %ld\n", + current, percpu); + +cleanup_children: + for (i = 0; i < 1000; i++) { + child = cg_name_indexed(parent, "child", i); + cg_destroy(child); + free(child); + } + +cleanup: + cg_destroy(parent); + free(parent); + + return ret; +} + #define T(x) { x, #x } struct kmem_test { int (*fn)(const char *root); @@ -341,6 +408,7 @@ struct kmem_test { T(test_kmem_proc_kpagecgroup), T(test_kmem_kernel_stacks), T(test_kmem_dead_cgroups), + T(test_percpu_basic), }; #undef T