From patchwork Mon Jun 8 23:08:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 11594137 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 421D9159A for ; Mon, 8 Jun 2020 23:10:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0E02720C56 for ; Mon, 8 Jun 2020 23:10:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="H/tMgPAa" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0E02720C56 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7E6DF6B000D; Mon, 8 Jun 2020 19:10:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 796C36B0030; Mon, 8 Jun 2020 19:10:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5ED0B6B0032; Mon, 8 Jun 2020 19:10:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0222.hostedemail.com [216.40.44.222]) by kanga.kvack.org (Postfix) with ESMTP id 345FD6B000D for ; Mon, 8 Jun 2020 19:10:45 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D510570750 for ; Mon, 8 Jun 2020 23:10:44 +0000 (UTC) X-FDA: 76907591208.21.bread61_521542e26dbd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id A8E64180839A1 for ; Mon, 8 Jun 2020 23:10:44 +0000 (UTC) X-Spam-Summary: 2,0,0,629ab7a23be1936e,d41d8cd98f00b204,prvs=3428c3563a=guro@fb.com,,RULES_HIT:41:355:379:541:800:966:967:973:981:988:989:1260:1261:1277:1311:1313:1314:1345:1437:1513:1515:1516:1518:1521:1535:1544:1605:1711:1730:1747:1777:1792:1801:2196:2199:2393:2525:2559:2564:2682:2685:2859:2890:2892:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4042:4118:4250:4321:4385:4605:5007:6119:6261:6653:7903:8784:9025:10004:10394:11026:11658:11914:12043:12048:12296:12297:12555:12679:12895:12986:13141:13161:13191:13192:13227:13229:13230:13845:13869:14096:14097:14181:14394:14721:21080:21433:21450:21451:21627:21740:21749:21789:21795:21811:21939:21987:21990:30001:30005:30034:30051:30054:30056:30064:30075,0,RBL:67.231.153.30:@fb.com:.lbl8.mailshell.net-64.201.201.201 62.12.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neut ral,Cust X-HE-Tag: bread61_521542e26dbd X-Filterd-Recvd-Size: 7760 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Mon, 8 Jun 2020 23:10:44 +0000 (UTC) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 058NAZcU002290 for ; Mon, 8 Jun 2020 16:10:43 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding : content-type; s=facebook; bh=iZik9kjtNXhlqIzQjOz4gt8vpRR1qoidjVhn1483qV8=; b=H/tMgPAat3Y6RSfpSaayEQvorzAx5c1OLND26btMW8OVoGV0b96mBFlP+MYaBToyZl7A nrAAb2djShGDN7ncht47fLdMZZLdhNzYZPigucQUFR7cHfAdd1qu2dO5/8lTgEa39fOQ 03LMjSrMCU3flwoafc/Axas1gygN+Py2eCc= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 31g8nkt76u-10 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 08 Jun 2020 16:10:43 -0700 Received: from intmgw003.06.prn3.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Mon, 8 Jun 2020 16:10:42 -0700 Received: by devvm1291.vll0.facebook.com (Postfix, from userid 111017) id 4E1F71D8FFD1; Mon, 8 Jun 2020 16:08:28 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm1291.vll0.facebook.com To: Andrew Morton , Dennis Zhou , Tejun Heo , Christoph Lameter CC: Johannes Weiner , Michal Hocko , Shakeel Butt , , , , Roman Gushchin Smtp-Origin-Cluster: vll0c01 Subject: [PATCH v2 0/5] mm: memcg accounting of percpu memory Date: Mon, 8 Jun 2020 16:08:14 -0700 Message-ID: <20200608230819.832349-1-guro@fb.com> X-Mailer: git-send-email 2.24.1 MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216,18.0.687 definitions=2020-06-08_18:2020-06-08,2020-06-08 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 impostorscore=0 phishscore=0 clxscore=1015 mlxscore=0 priorityscore=1501 malwarescore=0 spamscore=0 mlxlogscore=999 suspectscore=0 cotscore=-2147483648 lowpriorityscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006080161 X-FB-Internal: deliver X-Rspamd-Queue-Id: A8E64180839A1 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patchset adds percpu memory accounting to memory cgroups. It's based on the rework of the slab controller and reuses concepts and features introduced for the per-object slab accounting. Percpu memory is becoming more and more widely used by various subsystems, and the total amount of memory controlled by the percpu allocator can make a good part of the total memory. As an example, bpf maps can consume a lot of percpu memory, and they are created by a user. Also, some cgroup internals (e.g. memory controller statistics) can be quite large. On a machine with many CPUs and big number of cgroups they can consume hundreds of megabytes. So the lack of memcg accounting is creating a breach in the memory isolation. Similar to the slab memory, percpu memory should be accounted by default. Percpu allocations by their nature are scattered over multiple pages, so they can't be tracked on the per-page basis. So the per-object tracking introduced by the new slab controller is reused. The patchset implements charging of percpu allocations, adds memcg-level statistics, enables accounting for percpu allocations made by memory cgroup internals and provides some basic tests. To implement the accounting of percpu memory without a significant memory and performance overhead the following approach is used: all accounted allocations are placed into a separate percpu chunk (or chunks). These chunks are similar to default chunks, except that they do have an attached vector of pointers to obj_cgroup objects, which is big enough to save a pointer for each allocated object. On the allocation, if the allocation has to be accounted (__GFP_ACCOUNT is passed, the allocating process belongs to a non-root memory cgroup, etc), the memory cgroup is getting charged and if the maximum limit is not exceeded the allocation is performed using a memcg-aware chunk. Otherwise -ENOMEM is returned or the allocation is forced over the limit, depending on gfp (as any other kernel memory allocation). The memory cgroup information is saved in the obj_cgroup vector at the corresponding offset. On the release time the memcg information is restored from the vector and the cgroup is getting uncharged. Unaccounted allocations (at this point the absolute majority of all percpu allocations) are performed in the old way, so no additional overhead is expected. To avoid pinning dying memory cgroups by outstanding allocations, obj_cgroup API is used instead of directly saving memory cgroup pointers. obj_cgroup is basically a pointer to a memory cgroup with a standalone reference counter. The trick is that it can be atomically swapped to point at the parent cgroup, so that the original memory cgroup can be released prior to all objects, which has been charged to it. Because all charges and statistics are fully recursive, it's perfectly correct to uncharge the parent cgroup instead. This scheme is used in the slab memory accounting, and percpu memory can just follow the scheme. This version is based on top of v6 of the new slab controller patchset. The following patches are actually required by this series: mm: memcg: factor out memcg- and lruvec-level changes out of __mod_lruvec_state() mm: memcg: prepare for byte-sized vmstat items mm: memcg: convert vmstat slab counters to bytes mm: slub: implement SLUB version of obj_to_index() mm: memcontrol: decouple reference counting from page accounting mm: memcg/slab: obj_cgroup API The whole series can be found here: https://github.com/rgushchin/linux/pull/new/percpu_acc.1 v2: 1) minor cosmetic fixes (Dennis) 2) rebased on top of v6 of the slab controller patchset v1: 1) fixed a bug with gfp flags handling (Dennis) 2) added some comments (Tejun and Dennis) 3) rebased on top of v5 of the slab controller patchset RFC: https://lore.kernel.org/linux-mm/20200519201806.2308480-1-guro@fb.com/T/#t Roman Gushchin (5): percpu: return number of released bytes from pcpu_free_area() mm: memcg/percpu: account percpu memory to memory cgroups mm: memcg/percpu: per-memcg percpu memory statistics mm: memcg: charge memcg percpu memory to the parent cgroup kselftests: cgroup: add perpcu memory accounting test Documentation/admin-guide/cgroup-v2.rst | 4 + include/linux/memcontrol.h | 8 + mm/memcontrol.c | 18 +- mm/percpu-internal.h | 55 +++++- mm/percpu-km.c | 5 +- mm/percpu-stats.c | 36 ++-- mm/percpu-vm.c | 5 +- mm/percpu.c | 206 ++++++++++++++++++--- tools/testing/selftests/cgroup/test_kmem.c | 70 ++++++- 9 files changed, 358 insertions(+), 49 deletions(-)