From patchwork Tue Oct 24 13:46:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 13434576 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFE30C25B48 for ; Tue, 24 Oct 2023 13:47:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B5616B0260; Tue, 24 Oct 2023 09:47:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 43F406B0261; Tue, 24 Oct 2023 09:47:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 269BE6B0266; Tue, 24 Oct 2023 09:47:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0ECAB6B0260 for ; Tue, 24 Oct 2023 09:47:00 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id DF377B5BFF for ; Tue, 24 Oct 2023 13:46:59 +0000 (UTC) X-FDA: 81380480958.14.BDF25E0 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf28.hostedemail.com (Postfix) with ESMTP id F33A5C000F for ; Tue, 24 Oct 2023 13:46:57 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=VInYczMe; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of 30co3ZQYKCHYmolYhVaiiafY.Wigfchor-ggepUWe.ila@flex--surenb.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=30co3ZQYKCHYmolYhVaiiafY.Wigfchor-ggepUWe.ila@flex--surenb.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698155218; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IRfqS9iu/xaXGtQj/xVRkZmpW45IwDBAix/OEbVBjfA=; b=4JYv2995Im1hacgzD4TxzU5W7JahQWoe4Ev/He7LHfhhbEvZQrcsr/9bdBqfnlHeRnbBYs UJekIvuQdywIw76nv55Skwq0FB2rzLK/nSGTtqMbOWfF3nWiYEuDOkG5Z773hJ0eBSoFtx 0okzzwmSXnoBuibZb0ZoMMhMqB6lMr0= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=VInYczMe; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of 30co3ZQYKCHYmolYhVaiiafY.Wigfchor-ggepUWe.ila@flex--surenb.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=30co3ZQYKCHYmolYhVaiiafY.Wigfchor-ggepUWe.ila@flex--surenb.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698155218; a=rsa-sha256; cv=none; b=vX7vFRPxXTwGPWJkEv+gpWLFamwdzhMECUmWYLNQL3e53fywMyUD1kdR1/DX+mAxrbiRZ9 QLHSG84oQiWDXC8O4mICdcpW4t2EV35P8yuBlVqHUL7DULdKuqycVkOYBwmW1G1OHAnoNh /xID8fJ1HwtmgU7JLyNEdmKoUOWRHAE= Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-d9caf486775so5283422276.2 for ; Tue, 24 Oct 2023 06:46:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698155217; x=1698760017; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=IRfqS9iu/xaXGtQj/xVRkZmpW45IwDBAix/OEbVBjfA=; b=VInYczMeeIWAYSVN0l93d7fKIv5qxoIzH1aUO/a010Jdc5ZYRGdhC6PHqgdTiKct33 hWP9r2mKTX5VqSwQrWwthzm1jZ4ia6UENj2Yd56V2wG6qkyCHXcDoetE7gPT29v4wQ2M BSG7txOEZtvh+GxFYZgQ1Da+OimC1y49BEk6mSRZufyGqsg6gFvY6Z4IbConmwb8Kfvw xvGqL9VITF0XTK8YyoZggajiUfydiyLKKg7P3ZqDp3PB0juGfI5frsRypy9mRw2yaGoL jrpBVBsvueGsGCvICKXDrWKtsyFcQza5xc9Omq0I7Q8ECWk11ok6ve2uv7X3i1zA7HYX ZaUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698155217; x=1698760017; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IRfqS9iu/xaXGtQj/xVRkZmpW45IwDBAix/OEbVBjfA=; b=Z2P/FM4XWiwAjwaQarwPgj0i0qJ9he4q2CKzi3jpeNmMGp1avgcfLbIEQqQ/rjKXfk 2d67PpCU306VslkyqtS4nHTE1YgPusvuVqxutzpHdIo4VFKrkgkgAnmtU04RgNdEMGBe FijGi9c9//X+w5q9sL2LhHhdEOFAjyySAWriWByeCGr+q6Z6U3+JAOhruBsCU7atsmDu DiaDh5aY5rr68aZWjCLWRHah4+h06DCx3jjcUW8ldKgyGaGowEVS4IuB/jak7u5vAD8I KbPaTtG/uMmOUDeiv3kh6WbPwF3jP5TbnUmpUkX5T4Jfpk8duNxkfnMVT2uOcfdRlO4j YjyQ== X-Gm-Message-State: AOJu0YwR1/+nj9sNCEZ5jTXw8TFVNeiz0JQLCgaIa+UTC/4pXF2BGAle QRxPBHlsQ36pezRqh8MDH/CuFN/zoXI= X-Google-Smtp-Source: AGHT+IExf3DH8HKKz+2vcB2BBcqQb151AnXodOPQ8JGUuxRZrra0fIQizb58SYKBrv5Wfv4WxbjgymhW4I4= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:45ba:3318:d7a5:336a]) (user=surenb job=sendgmr) by 2002:a0d:d50f:0:b0:5a7:be3f:159f with SMTP id x15-20020a0dd50f000000b005a7be3f159fmr287302ywd.5.1698155217039; Tue, 24 Oct 2023 06:46:57 -0700 (PDT) Date: Tue, 24 Oct 2023 06:46:04 -0700 In-Reply-To: <20231024134637.3120277-1-surenb@google.com> Mime-Version: 1.0 References: <20231024134637.3120277-1-surenb@google.com> X-Mailer: git-send-email 2.42.0.758.gaed0368e0e-goog Message-ID: <20231024134637.3120277-8-surenb@google.com> Subject: [PATCH v2 07/39] mm: introduce slabobj_ext to support slab object extensions From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: kent.overstreet@linux.dev, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, roman.gushchin@linux.dev, mgorman@suse.de, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, corbet@lwn.net, void@manifault.com, peterz@infradead.org, juri.lelli@redhat.com, ldufour@linux.ibm.com, catalin.marinas@arm.com, will@kernel.org, arnd@arndb.de, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, x86@kernel.org, peterx@redhat.com, david@redhat.com, axboe@kernel.dk, mcgrof@kernel.org, masahiroy@kernel.org, nathan@kernel.org, dennis@kernel.org, tj@kernel.org, muchun.song@linux.dev, rppt@kernel.org, paulmck@kernel.org, pasha.tatashin@soleen.com, yosryahmed@google.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, andreyknvl@gmail.com, keescook@chromium.org, ndesaulniers@google.com, vvvvvv@google.com, gregkh@linuxfoundation.org, ebiggers@google.com, ytcoode@gmail.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, vschneid@redhat.com, cl@linux.com, penberg@kernel.org, iamjoonsoo.kim@lge.com, 42.hyeyoo@gmail.com, glider@google.com, elver@google.com, dvyukov@google.com, shakeelb@google.com, songmuchun@bytedance.com, jbaron@akamai.com, rientjes@google.com, minchan@google.com, kaleshsingh@google.com, surenb@google.com, kernel-team@android.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, linux-arch@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, kasan-dev@googlegroups.com, cgroups@vger.kernel.org X-Rspamd-Queue-Id: F33A5C000F X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: dampokx7c1a3y5r3d1ewczrwnrj1ce64 X-HE-Tag: 1698155217-143639 X-HE-Meta: U2FsdGVkX18BSTHJKuVpEILBwr2oUXxMJIk8MxTkJdtNzqzA5umzA2wAsbXDdRcTEB2nP41QO9FHUwppyUsTFSswgMah2NyUnXeHwlFl1lfxkNsxRxLLuVLN75aBU+MTUUkWyBbiaNYO95t0b1T7SAH1oIY/ee4lEIfb1uwZd6TUZX51qsx1CqVUMtltNBqbDLzxfyj7cq98swcaps6TmLGwr7gm/lbAJBpc7Cn7rYStY+XTBvI62PaUUgQwZ9x7+KLNeaMcCkckVmTOATwpdYgSeGQByp1FU+TpE5Gn45l8drYW7jrh6+aFZPNudXEE15GoNg7BGcNSnWf0nh0glYwA8x+ao/Hym5MumWI7udI9qGMxvAMWn74Go42K7JRjA68yDkV99lgyIQzMAX8ivqIzPeDSfZGAeb/+o1vtVjr/eQailVQdifToZy+LSSjDkWe5PyfspCxEPAvtmHubwuUmonHTWXJh+SrWHXOo2X7slI2jGXs1Jk2NrYo3GU+oQMYfrCi9cZ4FyNTM1QnFCnFsje7a9qEOnORPVgDYPS2dKyq7p4bZ0jtmhbB+hdI3QbCoZjNJZ8qT7EksZ5tq2QX6l4Kw/BTtDqtaSos0jnWFh8GQi7TBdrhGHXlV/07fWdVXmXSuWpzAE8mCDNvUjEHuVjr4ezABk2m6sFFnTHUsDmjVZTVEA/LryUAItAKvooLlURfe6ryKSyC7VdRSk9e1CzS7vNONOR0F1z1P31sqM7q13Co1udlmkyNZCMKP8avkm/M4jI49dDCOSXLlL8AcmmL8WgsP1IrVZCVMTm8AmGQig/V5gzzw8yr/SA+bC8/eEvsyGgyYSvc9WR4kddie/2AlUUeISWH3jXJCIq1GxkEuatvmBDuFWsXZ9+wHkPhAjsavFleHYKYO0UVqDySrfUgDJpr+mBgZehIXE/tNrp3Tv9JFzU7RSJ4mnDCm55Gv8Up83AO97v6ONLT dSWWujok 79J4Yncb22xr/pgh1Evz3HwkM70e6H1prKYU4XGOnqNRwsA+J0UvGQEhIvlgG9K6DZ4t0zyNBENHGxC4D4/BAH/u7mB/6/iAb8ozOjlr4G7qoTojwoaJLO/2+KjEh3Uq8M6W3nXNkii+Qyj9b2KJp2RmPVXQ/q70Gv8FhOGn+uYtvtIyZ4usgIh8SQXObkZ7FZbP2larkMTRX/ZVe9BJUBOQ3RAek+MZUiDNzPHdkwL3PtUwEtZnERBjTLTUDvfxYY8QrGAj5O3zhGjg5CmgWeR1QQ3xWyxOvmekesf4QSH1KUkx5h31RMHpPyC4FEHsjRZq//yqj/nQ9/iHs5SV4FjeC2xYlTdMUFR5NYGkWMzK4UUMy8xQsIVpkLUkDzR0MjzUQWY/95QVBjQCO/+m/u3LWLJwQkAI907qfRxPakl1418hmEpwsQJ8W35KHuMiesFoZeZTePyMPluI7xO4O8wBN5NyKWZ5N+pbGELO3qSCnkeeoaHiIIYdA6NiBakt94VEly/NFC/by+J4L6AgU4okqnoyeAdXIw1x+5aT3+ylZXRYRHSH5zNUjt32aXXbfFBQWpdDT2wUouAf8CsVkv2y7PLT4dSIAYXZqNXbFUbK6QkPH7aAGXNimRd697xBQp2sRkRAjj2AW/wUfKLa07AhsjTMniA5/iXD2zuGfCd0dDXJMt05LVc0hIQ8RMRC/GrpNGMRSVIT48S7vuaJIHw1oltABQRQJC1sB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently slab pages can store only vectors of obj_cgroup pointers in page->memcg_data. Introduce slabobj_ext structure to allow more data to be stored for each slab object. Wrap obj_cgroup into slabobj_ext to support current functionality while allowing to extend slabobj_ext in the future. Signed-off-by: Suren Baghdasaryan --- include/linux/memcontrol.h | 20 +++-- include/linux/mm_types.h | 4 +- init/Kconfig | 4 + mm/kfence/core.c | 14 ++-- mm/kfence/kfence.h | 4 +- mm/memcontrol.c | 56 ++------------ mm/page_owner.c | 2 +- mm/slab.h | 148 +++++++++++++++++++++++++------------ mm/slab_common.c | 47 ++++++++++++ 9 files changed, 185 insertions(+), 114 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e4e24da16d2c..4b17ebb7e723 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -346,8 +346,8 @@ struct mem_cgroup { extern struct mem_cgroup *root_mem_cgroup; enum page_memcg_data_flags { - /* page->memcg_data is a pointer to an objcgs vector */ - MEMCG_DATA_OBJCGS = (1UL << 0), + /* page->memcg_data is a pointer to an slabobj_ext vector */ + MEMCG_DATA_OBJEXTS = (1UL << 0), /* page has been accounted as a non-slab kernel page */ MEMCG_DATA_KMEM = (1UL << 1), /* the next bit after the last actual flag */ @@ -385,7 +385,7 @@ static inline struct mem_cgroup *__folio_memcg(struct folio *folio) unsigned long memcg_data = folio->memcg_data; VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); - VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJCGS, folio); + VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio); VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_KMEM, folio); return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); @@ -406,7 +406,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio) unsigned long memcg_data = folio->memcg_data; VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); - VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJCGS, folio); + VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio); VM_BUG_ON_FOLIO(!(memcg_data & MEMCG_DATA_KMEM), folio); return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); @@ -503,7 +503,7 @@ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio) */ unsigned long memcg_data = READ_ONCE(folio->memcg_data); - if (memcg_data & MEMCG_DATA_OBJCGS) + if (memcg_data & MEMCG_DATA_OBJEXTS) return NULL; if (memcg_data & MEMCG_DATA_KMEM) { @@ -549,7 +549,7 @@ static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *ob static inline bool folio_memcg_kmem(struct folio *folio) { VM_BUG_ON_PGFLAGS(PageTail(&folio->page), &folio->page); - VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJCGS, folio); + VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio); return folio->memcg_data & MEMCG_DATA_KMEM; } @@ -1593,6 +1593,14 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, } #endif /* CONFIG_MEMCG */ +/* + * Extended information for slab objects stored as an array in page->memcg_data + * if MEMCG_DATA_OBJEXTS is set. + */ +struct slabobj_ext { + struct obj_cgroup *objcg; +} __aligned(8); + static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx) { __mod_lruvec_kmem_state(p, idx, 1); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 36c5b43999e6..5b55c4752c23 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -180,7 +180,7 @@ struct page { /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */ atomic_t _refcount; -#ifdef CONFIG_MEMCG +#ifdef CONFIG_SLAB_OBJ_EXT unsigned long memcg_data; #endif @@ -315,7 +315,7 @@ struct folio { }; atomic_t _mapcount; atomic_t _refcount; -#ifdef CONFIG_MEMCG +#ifdef CONFIG_SLAB_OBJ_EXT unsigned long memcg_data; #endif /* private: the union with struct page is transitional */ diff --git a/init/Kconfig b/init/Kconfig index 6d35728b94b2..78a7abe36037 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -937,10 +937,14 @@ config CGROUP_FAVOR_DYNMODS Say N if unsure. +config SLAB_OBJ_EXT + bool + config MEMCG bool "Memory controller" select PAGE_COUNTER select EVENTFD + select SLAB_OBJ_EXT help Provides control over the memory footprint of tasks in a cgroup. diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 3872528d0963..02b744d2e07d 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -599,9 +599,9 @@ static unsigned long kfence_init_pool(void) continue; __folio_set_slab(slab_folio(slab)); -#ifdef CONFIG_MEMCG - slab->memcg_data = (unsigned long)&kfence_metadata_init[i / 2 - 1].objcg | - MEMCG_DATA_OBJCGS; +#ifdef CONFIG_MEMCG_KMEM + slab->obj_exts = (unsigned long)&kfence_metadata_init[i / 2 - 1].obj_exts | + MEMCG_DATA_OBJEXTS; #endif } @@ -649,8 +649,8 @@ static unsigned long kfence_init_pool(void) if (!i || (i % 2)) continue; -#ifdef CONFIG_MEMCG - slab->memcg_data = 0; +#ifdef CONFIG_MEMCG_KMEM + slab->obj_exts = 0; #endif __folio_clear_slab(slab_folio(slab)); } @@ -1143,8 +1143,8 @@ void __kfence_free(void *addr) { struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr); -#ifdef CONFIG_MEMCG - KFENCE_WARN_ON(meta->objcg); +#ifdef CONFIG_MEMCG_KMEM + KFENCE_WARN_ON(meta->obj_exts.objcg); #endif /* * If the objects of the cache are SLAB_TYPESAFE_BY_RCU, defer freeing diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h index f46fbb03062b..084f5f36e8e7 100644 --- a/mm/kfence/kfence.h +++ b/mm/kfence/kfence.h @@ -97,8 +97,8 @@ struct kfence_metadata { struct kfence_track free_track; /* For updating alloc_covered on frees. */ u32 alloc_stack_hash; -#ifdef CONFIG_MEMCG - struct obj_cgroup *objcg; +#ifdef CONFIG_MEMCG_KMEM + struct slabobj_ext obj_exts; #endif }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5b009b233ab8..aca777f45d34 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2859,13 +2859,6 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg) } #ifdef CONFIG_MEMCG_KMEM -/* - * The allocated objcg pointers array is not accounted directly. - * Moreover, it should not come from DMA buffer and is not readily - * reclaimable. So those GFP bits should be masked off. - */ -#define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | __GFP_ACCOUNT) - /* * mod_objcg_mlstate() may be called with irq enabled, so * mod_memcg_lruvec_state() should be used. @@ -2884,62 +2877,27 @@ static inline void mod_objcg_mlstate(struct obj_cgroup *objcg, rcu_read_unlock(); } -int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s, - gfp_t gfp, bool new_slab) -{ - unsigned int objects = objs_per_slab(s, slab); - unsigned long memcg_data; - void *vec; - - gfp &= ~OBJCGS_CLEAR_MASK; - vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp, - slab_nid(slab)); - if (!vec) - return -ENOMEM; - - memcg_data = (unsigned long) vec | MEMCG_DATA_OBJCGS; - if (new_slab) { - /* - * If the slab is brand new and nobody can yet access its - * memcg_data, no synchronization is required and memcg_data can - * be simply assigned. - */ - slab->memcg_data = memcg_data; - } else if (cmpxchg(&slab->memcg_data, 0, memcg_data)) { - /* - * If the slab is already in use, somebody can allocate and - * assign obj_cgroups in parallel. In this case the existing - * objcg vector should be reused. - */ - kfree(vec); - return 0; - } - - kmemleak_not_leak(vec); - return 0; -} - static __always_inline struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p) { /* * Slab objects are accounted individually, not per-page. * Memcg membership data for each individual object is saved in - * slab->memcg_data. + * slab->obj_exts. */ if (folio_test_slab(folio)) { - struct obj_cgroup **objcgs; + struct slabobj_ext *obj_exts; struct slab *slab; unsigned int off; slab = folio_slab(folio); - objcgs = slab_objcgs(slab); - if (!objcgs) + obj_exts = slab_obj_exts(slab); + if (!obj_exts) return NULL; off = obj_to_index(slab->slab_cache, slab, p); - if (objcgs[off]) - return obj_cgroup_memcg(objcgs[off]); + if (obj_exts[off].objcg) + return obj_cgroup_memcg(obj_exts[off].objcg); return NULL; } @@ -2947,7 +2905,7 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p) /* * folio_memcg_check() is used here, because in theory we can encounter * a folio where the slab flag has been cleared already, but - * slab->memcg_data has not been freed yet + * slab->obj_exts has not been freed yet * folio_memcg_check() will guarantee that a proper memory * cgroup pointer or NULL will be returned. */ diff --git a/mm/page_owner.c b/mm/page_owner.c index 4e2723e1b300..de6ea5746acd 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -372,7 +372,7 @@ static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret, if (!memcg_data) goto out_unlock; - if (memcg_data & MEMCG_DATA_OBJCGS) + if (memcg_data & MEMCG_DATA_OBJEXTS) ret += scnprintf(kbuf + ret, count - ret, "Slab cache page\n"); diff --git a/mm/slab.h b/mm/slab.h index 799a315695c6..5a47125469f1 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -96,8 +96,8 @@ struct slab { #endif atomic_t __page_refcount; -#ifdef CONFIG_MEMCG - unsigned long memcg_data; +#ifdef CONFIG_SLAB_OBJ_EXT + unsigned long obj_exts; #endif }; @@ -106,8 +106,8 @@ struct slab { SLAB_MATCH(flags, __page_flags); SLAB_MATCH(compound_head, slab_cache); /* Ensure bit 0 is clear */ SLAB_MATCH(_refcount, __page_refcount); -#ifdef CONFIG_MEMCG -SLAB_MATCH(memcg_data, memcg_data); +#ifdef CONFIG_SLAB_OBJ_EXT +SLAB_MATCH(memcg_data, obj_exts); #endif #undef SLAB_MATCH static_assert(sizeof(struct slab) <= sizeof(struct page)); @@ -429,36 +429,106 @@ static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t fla return false; } -#ifdef CONFIG_MEMCG_KMEM +#ifdef CONFIG_SLAB_OBJ_EXT + /* - * slab_objcgs - get the object cgroups vector associated with a slab + * slab_obj_exts - get the pointer to the slab object extension vector + * associated with a slab. * @slab: a pointer to the slab struct * - * Returns a pointer to the object cgroups vector associated with the slab, + * Returns a pointer to the object extension vector associated with the slab, * or NULL if no such vector has been associated yet. */ -static inline struct obj_cgroup **slab_objcgs(struct slab *slab) +static inline struct slabobj_ext *slab_obj_exts(struct slab *slab) { - unsigned long memcg_data = READ_ONCE(slab->memcg_data); + unsigned long obj_exts = READ_ONCE(slab->obj_exts); - VM_BUG_ON_PAGE(memcg_data && !(memcg_data & MEMCG_DATA_OBJCGS), +#ifdef CONFIG_MEMCG + VM_BUG_ON_PAGE(obj_exts && !(obj_exts & MEMCG_DATA_OBJEXTS), slab_page(slab)); - VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, slab_page(slab)); + VM_BUG_ON_PAGE(obj_exts & MEMCG_DATA_KMEM, slab_page(slab)); - return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); + return (struct slabobj_ext *)(obj_exts & ~MEMCG_DATA_FLAGS_MASK); +#else + return (struct slabobj_ext *)obj_exts; +#endif } -int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s, - gfp_t gfp, bool new_slab); -void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, - enum node_stat_item idx, int nr); +int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s, + gfp_t gfp, bool new_slab); -static inline void memcg_free_slab_cgroups(struct slab *slab) +static inline bool need_slab_obj_ext(void) { - kfree(slab_objcgs(slab)); - slab->memcg_data = 0; + /* + * CONFIG_MEMCG_KMEM creates vector of obj_cgroup objects conditionally + * inside memcg_slab_post_alloc_hook. No other users for now. + */ + return false; } +static inline void free_slab_obj_exts(struct slab *slab) +{ + struct slabobj_ext *obj_exts; + + obj_exts = slab_obj_exts(slab); + if (!obj_exts) + return; + + kfree(obj_exts); + slab->obj_exts = 0; +} + +static inline struct slabobj_ext * +prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p) +{ + struct slab *slab; + + if (!p) + return NULL; + + if (!need_slab_obj_ext()) + return NULL; + + slab = virt_to_slab(p); + if (!slab_obj_exts(slab) && + WARN(alloc_slab_obj_exts(slab, s, flags, false), + "%s, %s: Failed to create slab extension vector!\n", + __func__, s->name)) + return NULL; + + return slab_obj_exts(slab) + obj_to_index(s, slab, p); +} + +#else /* CONFIG_SLAB_OBJ_EXT */ + +static inline struct slabobj_ext *slab_obj_exts(struct slab *slab) +{ + return NULL; +} + +static inline int alloc_slab_obj_exts(struct slab *slab, + struct kmem_cache *s, gfp_t gfp, + bool new_slab) +{ + return 0; +} + +static inline void free_slab_obj_exts(struct slab *slab) +{ +} + +static inline struct slabobj_ext * +prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p) +{ + return NULL; +} + +#endif /* CONFIG_SLAB_OBJ_EXT */ + +#ifdef CONFIG_MEMCG_KMEM +void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, + enum node_stat_item idx, int nr); + static inline size_t obj_full_size(struct kmem_cache *s) { /* @@ -526,16 +596,15 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, if (likely(p[i])) { slab = virt_to_slab(p[i]); - if (!slab_objcgs(slab) && - memcg_alloc_slab_cgroups(slab, s, flags, - false)) { + if (!slab_obj_exts(slab) && + alloc_slab_obj_exts(slab, s, flags, false)) { obj_cgroup_uncharge(objcg, obj_full_size(s)); continue; } off = obj_to_index(s, slab, p[i]); obj_cgroup_get(objcg); - slab_objcgs(slab)[off] = objcg; + slab_obj_exts(slab)[off].objcg = objcg; mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s), obj_full_size(s)); } else { @@ -548,14 +617,14 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, int objects) { - struct obj_cgroup **objcgs; + struct slabobj_ext *obj_exts; int i; if (!memcg_kmem_online()) return; - objcgs = slab_objcgs(slab); - if (!objcgs) + obj_exts = slab_obj_exts(slab); + if (!obj_exts) return; for (i = 0; i < objects; i++) { @@ -563,11 +632,11 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, unsigned int off; off = obj_to_index(s, slab, p[i]); - objcg = objcgs[off]; + objcg = obj_exts[off].objcg; if (!objcg) continue; - objcgs[off] = NULL; + obj_exts[off].objcg = NULL; obj_cgroup_uncharge(objcg, obj_full_size(s)); mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s), -obj_full_size(s)); @@ -576,27 +645,11 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, } #else /* CONFIG_MEMCG_KMEM */ -static inline struct obj_cgroup **slab_objcgs(struct slab *slab) -{ - return NULL; -} - static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr) { return NULL; } -static inline int memcg_alloc_slab_cgroups(struct slab *slab, - struct kmem_cache *s, gfp_t gfp, - bool new_slab) -{ - return 0; -} - -static inline void memcg_free_slab_cgroups(struct slab *slab) -{ -} - static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, struct list_lru *lru, struct obj_cgroup **objcgp, @@ -633,7 +686,7 @@ static __always_inline void account_slab(struct slab *slab, int order, struct kmem_cache *s, gfp_t gfp) { if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT)) - memcg_alloc_slab_cgroups(slab, s, gfp, true); + alloc_slab_obj_exts(slab, s, gfp, true); mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s), PAGE_SIZE << order); @@ -642,8 +695,7 @@ static __always_inline void account_slab(struct slab *slab, int order, static __always_inline void unaccount_slab(struct slab *slab, int order, struct kmem_cache *s) { - if (memcg_kmem_online()) - memcg_free_slab_cgroups(slab); + free_slab_obj_exts(slab); mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s), -(PAGE_SIZE << order)); @@ -723,6 +775,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s, unsigned int orig_size) { unsigned int zero_size = s->object_size; + struct slabobj_ext *obj_exts; bool kasan_init = init; size_t i; @@ -765,6 +818,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s, kmemleak_alloc_recursive(p[i], s->object_size, 1, s->flags, flags); kmsan_slab_alloc(s, p[i], flags); + obj_exts = prepare_slab_obj_exts_hook(s, flags, p[i]); } memcg_slab_post_alloc_hook(s, objcg, flags, size, p); diff --git a/mm/slab_common.c b/mm/slab_common.c index 9bbffe82d65a..2b42a9d2c11c 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -206,6 +206,53 @@ struct kmem_cache *find_mergeable(unsigned int size, unsigned int align, return NULL; } +#ifdef CONFIG_SLAB_OBJ_EXT +/* + * The allocated objcg pointers array is not accounted directly. + * Moreover, it should not come from DMA buffer and is not readily + * reclaimable. So those GFP bits should be masked off. + */ +#define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | __GFP_ACCOUNT) + +int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s, + gfp_t gfp, bool new_slab) +{ + unsigned int objects = objs_per_slab(s, slab); + unsigned long obj_exts; + void *vec; + + gfp &= ~OBJCGS_CLEAR_MASK; + vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp, + slab_nid(slab)); + if (!vec) + return -ENOMEM; + + obj_exts = (unsigned long)vec; +#ifdef CONFIG_MEMCG + obj_exts |= MEMCG_DATA_OBJEXTS; +#endif + if (new_slab) { + /* + * If the slab is brand new and nobody can yet access its + * obj_exts, no synchronization is required and obj_exts can + * be simply assigned. + */ + slab->obj_exts = obj_exts; + } else if (cmpxchg(&slab->obj_exts, 0, obj_exts)) { + /* + * If the slab is already in use, somebody can allocate and + * assign slabobj_exts in parallel. In this case the existing + * objcg vector should be reused. + */ + kfree(vec); + return 0; + } + + kmemleak_not_leak(vec); + return 0; +} +#endif /* CONFIG_SLAB_OBJ_EXT */ + static struct kmem_cache *create_cache(const char *name, unsigned int object_size, unsigned int align, slab_flags_t flags, unsigned int useroffset,