From patchwork Fri Apr 12 09:24:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peng Zhang X-Patchwork-Id: 13627432 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26EF5C04FF6 for ; Fri, 12 Apr 2024 09:24:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 99B516B007B; Fri, 12 Apr 2024 05:24:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8A0926B0087; Fri, 12 Apr 2024 05:24:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C70D6B007B; Fri, 12 Apr 2024 05:24:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4182E6B0082 for ; Fri, 12 Apr 2024 05:24:57 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 09F501A0DA0 for ; Fri, 12 Apr 2024 09:24:57 +0000 (UTC) X-FDA: 82000345434.28.183A7B7 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf29.hostedemail.com (Postfix) with ESMTP id 91E32120002 for ; Fri, 12 Apr 2024 09:24:54 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of zhangpeng362@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=zhangpeng362@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712913895; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4oxEpm/VY/ZzYCYrecE/zdeMbAUEN33bRe2b8RnUcZQ=; b=6/EfhSBiK4tQi7iKDLJzL7BBrlGxfebKu6Y5ZZx+qgXU94h2yVmq8nMAA1D+2aJ0FkffH5 aD2Ic5auGk5tpoOPajarFsNOb7/n+T8VRPOU/S7jbJFURXEz5TM6kq+CQZsDuWkGd20Owt q17SlHOgE2+sbegU/Fxi/yX+yA9DeJY= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of zhangpeng362@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=zhangpeng362@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712913895; a=rsa-sha256; cv=none; b=S7A6Rt58thSjtEAzaAgm7vaeKQ6JOpz5GFZ1quSXyE9zQli9suhMLfCPliKmkIczBVUa6r C7fucUzY1q1JIBQzwiW0SqjV9EBTW90WEsBiuxHfOciVSxtw+dglhSTeJNKpB5Bhu2DMOs QbOLqzwW9FlWBVBOPYLJM2h+xBZCKRk= Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4VGB1445JlzNnch; Fri, 12 Apr 2024 17:22:32 +0800 (CST) Received: from kwepemm600020.china.huawei.com (unknown [7.193.23.147]) by mail.maildlp.com (Postfix) with ESMTPS id C18CC180073; Fri, 12 Apr 2024 17:24:50 +0800 (CST) Received: from localhost.localdomain (10.175.112.125) by kwepemm600020.china.huawei.com (7.193.23.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Fri, 12 Apr 2024 17:24:49 +0800 From: Peng Zhang To: , CC: , , , , , , , , , , , , Subject: [RFC PATCH 1/3] Lazy percpu counters Date: Fri, 12 Apr 2024 17:24:39 +0800 Message-ID: <20240412092441.3112481-2-zhangpeng362@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240412092441.3112481-1-zhangpeng362@huawei.com> References: <20240412092441.3112481-1-zhangpeng362@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.112.125] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemm600020.china.huawei.com (7.193.23.147) X-Rspamd-Queue-Id: 91E32120002 X-Rspam-User: X-Stat-Signature: jx3ai178t6japkjygdczi7acbp4ohtgk X-Rspamd-Server: rspam01 X-HE-Tag: 1712913894-945689 X-HE-Meta: U2FsdGVkX18cuVp+cIeHCNQPaVrHFXYbcAjmeRJnRlZLWECU/rubPp2YxviIid3ZVPoeyuC8emhJFXJBpLBzwQiI47SOfFanb2y0rHifpvSXVZEso7viSflMxSl09vU0UGurKUQ4Bn1neohi+udtmrb0P76TzNPQADcThCWrXpgOSwhXYA10g2GKNp/pP9hOSe1ye/9/lxyX7gDBQi7UgZrpRPlTo3VU0A/TOhO29+byqzbUxYUzYLq6ElG0sfPF/PBVYcxUoGfxnuDtAeL3NLJ2jIDFwht+oO4j5Snfje/vJx+ZRuX7OO9qETrNR4GNgRLy69WA4okOnqXj7AEUp8e0H223ka4PbK2W11JFtDqY2QxLQEdzlbvcwjGgRccgnPA9Nkj8tz/RrZct/ua5ld3d6Vy7ImWJJUIYuzoSIfCtPdgV8sHi6QimTMUrRi0ji2h+66de5Nb5lo/lxt8Qk9xttKm2B9D2Bd+eRbCFP37288GP0c3/O/OhbEIfrsfHtX5vx0rIIVk04G4tkYyQjNhdUNe5d8IU6Y0a4rUYFRv0qvviCrjf5aA1VZrIDoei+c29cgXNGrITHceq1iwkyGk5L1bfM4y9v3mway907bbLGoNKA/1s3OzYulmnd9jc8G59Rj7Atqha+Kd6hXCSfksi8jX0fElvE+NfP+hJjEJjSY/+6Mh4Wgx2C5oezdufGGNYhP3bSHq07QGNTuEfF2YhLSnlWkAxYyrfdCbDZiMTQ8EPxoBbgYuewQcEftg7GiO6D1TcTjOU+3eusNYCXBPBhoVKrGKMfOu0Dx3RFSHsyu705yXv5GPjKfH3IY8aTu2LMsQkWnBktInyf8c9EoyelqE++vzBydMHX7QwlluxpMURivrwP2KK6GvIETUhOnAUDVwwthd58O1D5iTrTVuArKIC3qm1OPlGkq7/QXgtLHq5WS+HtpazUBYfmowTl9NYq8H3axsUtdXDZeN 9OWB1iR+ yV/vTL719R0S0PHLHYnb+GuuRkioguQfYnSHrkUAer1D2tBysDzMM2Uaclg/lSICejuPUzK7PSfLhjTrDs3VEVIuV4aO4IEYAzJFgqM3Osq+v5DCyaaHz6Fz5kPurgLlh+3H3fmlr0QLpcQHr9ZKAfE75WzyHHHtDMzLEfYoeFvHqN5s= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kent Overstreet This patch adds lib/lazy-percpu-counter.c, which implements counters that start out as atomics, but lazily switch to percpu mode if the update rate crosses some threshold (arbitrarily set at 256 per second). Signed-off-by: Kent Overstreet Signed-off-by: Suren Baghdasaryan Signed-off-by: ZhangPeng --- include/linux/lazy-percpu-counter.h | 82 +++++++++++++++++++++++++++++ lib/Makefile | 2 +- lib/lazy-percpu-counter.c | 82 +++++++++++++++++++++++++++++ 3 files changed, 165 insertions(+), 1 deletion(-) create mode 100644 include/linux/lazy-percpu-counter.h create mode 100644 lib/lazy-percpu-counter.c diff --git a/include/linux/lazy-percpu-counter.h b/include/linux/lazy-percpu-counter.h new file mode 100644 index 000000000000..281b8dd88cb2 --- /dev/null +++ b/include/linux/lazy-percpu-counter.h @@ -0,0 +1,82 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Lazy percpu counters: + * (C) 2022 Kent Overstreet + * + * Lazy percpu counters start out in atomic mode, then switch to percpu mode if + * the update rate crosses some threshold. + * + * This means we don't have to decide between low memory overhead atomic + * counters and higher performance percpu counters - we can have our cake and + * eat it, too! + * + * Internally we use an atomic64_t, where the low bit indicates whether we're in + * percpu mode, and the high 8 bits are a secondary counter that's incremented + * when the counter is modified - meaning 55 bits of precision are available for + * the counter itself. + */ + +#ifndef _LINUX_LAZY_PERCPU_COUNTER_H +#define _LINUX_LAZY_PERCPU_COUNTER_H + +#include +#include + +struct lazy_percpu_counter { + atomic64_t v; + unsigned long last_wrap; +}; + +void lazy_percpu_counter_exit(struct lazy_percpu_counter *c); +void lazy_percpu_counter_add_slowpath(struct lazy_percpu_counter *c, s64 i); + +/* + * We use the high bits of the atomic counter for a secondary counter, which is + * incremented every time the counter is touched. When the secondary counter + * wraps, we check the time the counter last wrapped, and if it was recent + * enough that means the update frequency has crossed our threshold and we + * switch to percpu mode: + */ +#define COUNTER_MOD_BITS 8 +#define COUNTER_MOD_MASK ~(~0ULL >> COUNTER_MOD_BITS) +#define COUNTER_MOD_BITS_START (64 - COUNTER_MOD_BITS) + +/* + * We use the low bit of the counter to indicate whether we're in atomic mode + * (low bit clear), or percpu mode (low bit set, counter is a pointer to actual + * percpu counters: + */ +#define COUNTER_IS_PCPU_BIT 1 + +static inline u64 __percpu *lazy_percpu_counter_is_pcpu(u64 v) +{ + if (!(v & COUNTER_IS_PCPU_BIT)) + return NULL; + + v ^= COUNTER_IS_PCPU_BIT; + return (u64 __percpu *)(unsigned long)v; +} + +/** + * lazy_percpu_counter_add: Add a value to a lazy_percpu_counter + * + * @c: counter to modify + * @i: value to add + */ +static inline void lazy_percpu_counter_add(struct lazy_percpu_counter *c, s64 i) +{ + u64 v = atomic64_read(&c->v); + u64 __percpu *pcpu_v = lazy_percpu_counter_is_pcpu(v); + + if (likely(pcpu_v)) + this_cpu_add(*pcpu_v, i); + else + lazy_percpu_counter_add_slowpath(c, i); +} + +static inline void lazy_percpu_counter_sub(struct lazy_percpu_counter *c, s64 i) +{ + lazy_percpu_counter_add(c, -i); +} + +#endif /* _LINUX_LAZY_PERCPU_COUNTER_H */ diff --git a/lib/Makefile b/lib/Makefile index 2f4e17bfb299..7afa0c3e7cc7 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -46,7 +46,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \ list_sort.o uuid.o iov_iter.o clz_ctz.o \ bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \ - percpu-refcount.o rhashtable.o base64.o \ + percpu-refcount.o lazy-percpu-counter.o rhashtable.o base64.o \ once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ generic-radix-tree.o bitmap-str.o obj-$(CONFIG_STRING_KUNIT_TEST) += string_kunit.o diff --git a/lib/lazy-percpu-counter.c b/lib/lazy-percpu-counter.c new file mode 100644 index 000000000000..e1914207214d --- /dev/null +++ b/lib/lazy-percpu-counter.c @@ -0,0 +1,82 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include +#include +#include +#include + +static inline s64 lazy_percpu_counter_atomic_val(s64 v) +{ + /* Ensure output is sign extended properly: */ + return (v << COUNTER_MOD_BITS) >> + (COUNTER_MOD_BITS + COUNTER_IS_PCPU_BIT); +} + +static void lazy_percpu_counter_switch_to_pcpu(struct lazy_percpu_counter *c) +{ + u64 __percpu *pcpu_v = alloc_percpu_gfp(u64, GFP_ATOMIC|__GFP_NOWARN); + u64 old, new, v; + + if (!pcpu_v) + return; + + preempt_disable(); + v = atomic64_read(&c->v); + do { + if (lazy_percpu_counter_is_pcpu(v)) { + free_percpu(pcpu_v); + return; + } + + old = v; + new = (unsigned long)pcpu_v | 1; + + *this_cpu_ptr(pcpu_v) = lazy_percpu_counter_atomic_val(v); + } while ((v = atomic64_cmpxchg(&c->v, old, new)) != old); + preempt_enable(); +} + +/** + * lazy_percpu_counter_exit: Free resources associated with a + * lazy_percpu_counter + * + * @c: counter to exit + */ +void lazy_percpu_counter_exit(struct lazy_percpu_counter *c) +{ + free_percpu(lazy_percpu_counter_is_pcpu(atomic64_read(&c->v))); +} +EXPORT_SYMBOL_GPL(lazy_percpu_counter_exit); + +void lazy_percpu_counter_add_slowpath(struct lazy_percpu_counter *c, s64 i) +{ + u64 atomic_i; + u64 old, v = atomic64_read(&c->v); + u64 __percpu *pcpu_v; + + atomic_i = i << COUNTER_IS_PCPU_BIT; + atomic_i &= ~COUNTER_MOD_MASK; + atomic_i |= 1ULL << COUNTER_MOD_BITS_START; + + do { + pcpu_v = lazy_percpu_counter_is_pcpu(v); + if (pcpu_v) { + this_cpu_add(*pcpu_v, i); + return; + } + + old = v; + } while ((v = atomic64_cmpxchg(&c->v, old, old + atomic_i)) != old); + + if (unlikely(!(v & COUNTER_MOD_MASK))) { + unsigned long now = jiffies; + + if (c->last_wrap && + unlikely(time_after(c->last_wrap + HZ, now))) + lazy_percpu_counter_switch_to_pcpu(c); + else + c->last_wrap = now; + } +} +EXPORT_SYMBOL(lazy_percpu_counter_add_slowpath); From patchwork Fri Apr 12 09:24:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peng Zhang X-Patchwork-Id: 13627433 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4BE5C4345F for ; Fri, 12 Apr 2024 09:25:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D575F6B0083; Fri, 12 Apr 2024 05:24:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D07CE6B0087; Fri, 12 Apr 2024 05:24:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE1AF6B0089; Fri, 12 Apr 2024 05:24:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9F6F36B0083 for ; Fri, 12 Apr 2024 05:24:57 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6F6331C1182 for ; Fri, 12 Apr 2024 09:24:57 +0000 (UTC) X-FDA: 82000345434.22.CA65537 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf09.hostedemail.com (Postfix) with ESMTP id 01B2A140018 for ; Fri, 12 Apr 2024 09:24:54 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of zhangpeng362@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=zhangpeng362@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712913895; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Mi93krnxZRxKjJUzpinsMipngaeZpGOt6BBciD7NOoQ=; b=PsnSPgMKhdC8gtVi7LP+BzQYcQ2yj3IIIXFP6hfnfcPYJBfBiSvO5DChUf6/Tb+dOSYEU7 NK6NO+4gVXfXiLbEPhqgGwk06fm89ZWtAN1/3VLHsTpid1nnAXYm06N1Q95dn9E7h/2yQL w1HJaFY4YMQIN23XxhTIygTK6j7XYPk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712913895; a=rsa-sha256; cv=none; b=aWX1H6SVP+PUjqGi0WsbMz2BglWS1DZ8KqifjnLxvm64Divg5J9xaMX8ID6uD6h5Om0TtD hZ84xI0HCiQ/NH2BDCUbEKzn0USNmZ0XBY6fviAPu0ux8bUF91gJIZ97+Zj0kq98NQIz/x y0JIC58edlCKR0jDTTJDbfE5NonGlJo= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of zhangpeng362@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=zhangpeng362@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.254]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4VGB0L1sf0zwS0n; Fri, 12 Apr 2024 17:21:54 +0800 (CST) Received: from kwepemm600020.china.huawei.com (unknown [7.193.23.147]) by mail.maildlp.com (Postfix) with ESMTPS id CF99218007B; Fri, 12 Apr 2024 17:24:51 +0800 (CST) Received: from localhost.localdomain (10.175.112.125) by kwepemm600020.china.huawei.com (7.193.23.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Fri, 12 Apr 2024 17:24:50 +0800 From: Peng Zhang To: , CC: , , , , , , , , , , , , Subject: [RFC PATCH 2/3] lazy_percpu_counter: include struct percpu_counter in struct lazy_percpu_counter Date: Fri, 12 Apr 2024 17:24:40 +0800 Message-ID: <20240412092441.3112481-3-zhangpeng362@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240412092441.3112481-1-zhangpeng362@huawei.com> References: <20240412092441.3112481-1-zhangpeng362@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.112.125] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemm600020.china.huawei.com (7.193.23.147) X-Stat-Signature: zfmb15ts3gt8ynyqh3mostjnpojffymj X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 01B2A140018 X-Rspam-User: X-HE-Tag: 1712913894-917525 X-HE-Meta: U2FsdGVkX199ncQx0NLezlZ43F5gx0NVJm7PyL9Z5qIb2GQWAk81F9JPrqooI0NA7a1KtwmSrn57/NCgo9hN50Zt865PdI2vQuwS6XZaKAeKCt535cowvA+wIlOeJ7k/3Rp299CYdL0ThOoy6dctpj++Q3xXQyZYOZT4nT0zqNsw9IFTt4p3CuzuL7rAXga9mK/z1VnMDPJciWKfYMP23A8gJER0zxiZhMH8BlW2ViLNOGeUuMIxXJEtZTNID6Gl+IbykRVPhZulIEgMaK5osCtGSVbx+yVw/9l16vGUoyyalYse5KCjAGor4uJx8kpzeMsBO+/P1oz/4ei0FmvmD3dwjWB7WCuTjV+AdtsKZa/Tva94JoumaBTDO76Syhpgnjb7wNBDClejlsZdn7/DVaXo+BxdOF9hIEwsFOrncccow7cFmdFCf63YM9xA6z9FSSJdr7/4U/uUt3usMPuhWzuk+MLerCLu2KolVjS9QV/+D12VVZIih75zD4T2gQVZychzWg519X0I+4H40ZS3nTZB50KATwX7Zm93RhQCdTMudi94ZvmUjZzq4b9v2mbzytO9iK9vn/qSwgDQDzz/IPNnB3Wtv2QWjZOidOPExczfitzTiU+UozieX6XS2BbF3aH2U7UQqx3Iy4IVVYezvAQFTeoDM2vvZdg0nD1Fjr+R6FPckpqlwimN1LT+518EUA5dwZst0YyT+zkmJWhGdGRY5rK1rceRGJiM1tvTihCe1dgZJgV79TzjVhXZb1RNJLMWq8G/kvXoyjRAE+fGv9HDacc/FsmhKaW4nvOIB0cFTJQKYR5SHBOWGEiuYHQ/jQsNdmZMaPa9Rf4kqpkAcxkeh9jTEXZhFkiVngX/tP/wnlkQFMY9J6c5CUdSto0rILmnKLa4pImjPigITgHsJQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: ZhangPeng Add the struct percpu_counter fbc to struct lazy_percpu_counter. Convert the u64 __percpu parameter of the lazy percpu counter function to the struct percpu_counter parameter to prepare for converting mm's rss stats into lazy_percpu_counter. Signed-off-by: ZhangPeng Signed-off-by: Kefeng Wang --- include/linux/lazy-percpu-counter.h | 16 ++++-- lib/lazy-percpu-counter.c | 83 +++++++++++++++++++++++------ 2 files changed, 77 insertions(+), 22 deletions(-) diff --git a/include/linux/lazy-percpu-counter.h b/include/linux/lazy-percpu-counter.h index 281b8dd88cb2..03ff24f0128d 100644 --- a/include/linux/lazy-percpu-counter.h +++ b/include/linux/lazy-percpu-counter.h @@ -20,15 +20,21 @@ #define _LINUX_LAZY_PERCPU_COUNTER_H #include +#include #include struct lazy_percpu_counter { atomic64_t v; unsigned long last_wrap; + struct percpu_counter fbc; }; -void lazy_percpu_counter_exit(struct lazy_percpu_counter *c); +void lazy_percpu_counter_destroy_many(struct lazy_percpu_counter *c, + u32 nr_counters); void lazy_percpu_counter_add_slowpath(struct lazy_percpu_counter *c, s64 i); +s64 lazy_percpu_counter_read_positive(struct lazy_percpu_counter *c); +s64 lazy_percpu_counter_sum(struct lazy_percpu_counter *c); +s64 lazy_percpu_counter_sum_positive(struct lazy_percpu_counter *c); /* * We use the high bits of the atomic counter for a secondary counter, which is @@ -48,13 +54,13 @@ void lazy_percpu_counter_add_slowpath(struct lazy_percpu_counter *c, s64 i); */ #define COUNTER_IS_PCPU_BIT 1 -static inline u64 __percpu *lazy_percpu_counter_is_pcpu(u64 v) +static inline struct percpu_counter *lazy_percpu_counter_is_pcpu(u64 v) { if (!(v & COUNTER_IS_PCPU_BIT)) return NULL; v ^= COUNTER_IS_PCPU_BIT; - return (u64 __percpu *)(unsigned long)v; + return (struct percpu_counter *)(unsigned long)v; } /** @@ -66,10 +72,10 @@ static inline u64 __percpu *lazy_percpu_counter_is_pcpu(u64 v) static inline void lazy_percpu_counter_add(struct lazy_percpu_counter *c, s64 i) { u64 v = atomic64_read(&c->v); - u64 __percpu *pcpu_v = lazy_percpu_counter_is_pcpu(v); + struct percpu_counter *pcpu_v = lazy_percpu_counter_is_pcpu(v); if (likely(pcpu_v)) - this_cpu_add(*pcpu_v, i); + percpu_counter_add(pcpu_v, i); else lazy_percpu_counter_add_slowpath(c, i); } diff --git a/lib/lazy-percpu-counter.c b/lib/lazy-percpu-counter.c index e1914207214d..c360903cc02a 100644 --- a/lib/lazy-percpu-counter.c +++ b/lib/lazy-percpu-counter.c @@ -15,45 +15,94 @@ static inline s64 lazy_percpu_counter_atomic_val(s64 v) static void lazy_percpu_counter_switch_to_pcpu(struct lazy_percpu_counter *c) { - u64 __percpu *pcpu_v = alloc_percpu_gfp(u64, GFP_ATOMIC|__GFP_NOWARN); u64 old, new, v; + unsigned long flags; + bool allocated = false; - if (!pcpu_v) - return; - + local_irq_save(flags); preempt_disable(); v = atomic64_read(&c->v); do { - if (lazy_percpu_counter_is_pcpu(v)) { - free_percpu(pcpu_v); - return; + if (lazy_percpu_counter_is_pcpu(v)) + break; + + if (!allocated) { + if (percpu_counter_init(&c->fbc, 0, GFP_ATOMIC|__GFP_NOWARN)) + break; + allocated = true; } old = v; - new = (unsigned long)pcpu_v | 1; + new = (unsigned long)&c->fbc | 1; - *this_cpu_ptr(pcpu_v) = lazy_percpu_counter_atomic_val(v); + percpu_counter_set(&c->fbc, lazy_percpu_counter_atomic_val(v)); } while ((v = atomic64_cmpxchg(&c->v, old, new)) != old); preempt_enable(); + local_irq_restore(flags); } /** - * lazy_percpu_counter_exit: Free resources associated with a - * lazy_percpu_counter + * lazy_percpu_counter_destroy_many: Free resources associated with + * lazy_percpu_counters * - * @c: counter to exit + * @c: counters to exit + * @nr_counters: number of counters */ -void lazy_percpu_counter_exit(struct lazy_percpu_counter *c) +void lazy_percpu_counter_destroy_many(struct lazy_percpu_counter *c, + u32 nr_counters) +{ + struct percpu_counter *pcpu_v; + u32 i; + + for (i = 0; i < nr_counters; i++) { + pcpu_v = lazy_percpu_counter_is_pcpu(atomic64_read(&c[i].v)); + if (pcpu_v) + percpu_counter_destroy(pcpu_v); + } +} +EXPORT_SYMBOL_GPL(lazy_percpu_counter_destroy_many); + +s64 lazy_percpu_counter_read_positive(struct lazy_percpu_counter *c) +{ + s64 v = atomic64_read(&c->v); + struct percpu_counter *pcpu_v = lazy_percpu_counter_is_pcpu(v); + + if (pcpu_v) + return percpu_counter_read_positive(pcpu_v); + + return lazy_percpu_counter_atomic_val(v); +} +EXPORT_SYMBOL_GPL(lazy_percpu_counter_read_positive); + +s64 lazy_percpu_counter_sum(struct lazy_percpu_counter *c) +{ + s64 v = atomic64_read(&c->v); + struct percpu_counter *pcpu_v = lazy_percpu_counter_is_pcpu(v); + + if (pcpu_v) + return percpu_counter_sum(pcpu_v); + + return lazy_percpu_counter_atomic_val(v); +} +EXPORT_SYMBOL_GPL(lazy_percpu_counter_sum); + +s64 lazy_percpu_counter_sum_positive(struct lazy_percpu_counter *c) { - free_percpu(lazy_percpu_counter_is_pcpu(atomic64_read(&c->v))); + s64 v = atomic64_read(&c->v); + struct percpu_counter *pcpu_v = lazy_percpu_counter_is_pcpu(v); + + if (pcpu_v) + return percpu_counter_sum_positive(pcpu_v); + + return lazy_percpu_counter_atomic_val(v); } -EXPORT_SYMBOL_GPL(lazy_percpu_counter_exit); +EXPORT_SYMBOL_GPL(lazy_percpu_counter_sum_positive); void lazy_percpu_counter_add_slowpath(struct lazy_percpu_counter *c, s64 i) { u64 atomic_i; u64 old, v = atomic64_read(&c->v); - u64 __percpu *pcpu_v; + struct percpu_counter *pcpu_v; atomic_i = i << COUNTER_IS_PCPU_BIT; atomic_i &= ~COUNTER_MOD_MASK; @@ -62,7 +111,7 @@ void lazy_percpu_counter_add_slowpath(struct lazy_percpu_counter *c, s64 i) do { pcpu_v = lazy_percpu_counter_is_pcpu(v); if (pcpu_v) { - this_cpu_add(*pcpu_v, i); + percpu_counter_add(pcpu_v, i); return; } From patchwork Fri Apr 12 09:24:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peng Zhang X-Patchwork-Id: 13627434 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F25FC00A94 for ; Fri, 12 Apr 2024 09:25:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD2526B0087; Fri, 12 Apr 2024 05:24:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B5AFE6B0088; Fri, 12 Apr 2024 05:24:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D51D6B0089; Fri, 12 Apr 2024 05:24:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 748F96B0087 for ; Fri, 12 Apr 2024 05:24:59 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 1A667A15FF for ; Fri, 12 Apr 2024 09:24:59 +0000 (UTC) X-FDA: 82000345518.19.3F174C6 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by imf16.hostedemail.com (Postfix) with ESMTP id 75713180013 for ; Fri, 12 Apr 2024 09:24:56 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of zhangpeng362@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=zhangpeng362@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712913897; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=is3I5Cr7yqaHNl63rBu/V9neKHIQhVwv52vj+Cw/ATA=; b=IBrhjtwwdiYOXhMqsrrw2FLyY4EgnzCDcPfIyiKE1Km5Q1ks5GE8KN3DjkposLX+Yr1WKl Lo5QvLgHbPOtGcgudOJTMGBWH0NzfCOsLT8KWSpiEeA0d49yLKlUNESLcrb1ei3zPNr3yF YeGC8RhHWadBBb1tWp3tqp/nNRsaf70= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712913897; a=rsa-sha256; cv=none; b=MQPTV/yeCqdDJaQ8D4xw3Hw0i2gk6VubTw62v0zsAJjLbfB43XvR96esv9UzrKBvRPwtk/ y5g/2NjksOBzPw6ntmC9su9dIWUuaYThXA5V6pN3W8YeXbLYkMSSIdJGYXexQzImwPOUuQ UqRVYO/oNFR6gk9lK8Rf1PGhQRiBxHo= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of zhangpeng362@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=zhangpeng362@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.163]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4VGB0R3lh1z2NW5X; Fri, 12 Apr 2024 17:21:59 +0800 (CST) Received: from kwepemm600020.china.huawei.com (unknown [7.193.23.147]) by mail.maildlp.com (Postfix) with ESMTPS id E766118001A; Fri, 12 Apr 2024 17:24:52 +0800 (CST) Received: from localhost.localdomain (10.175.112.125) by kwepemm600020.china.huawei.com (7.193.23.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Fri, 12 Apr 2024 17:24:51 +0800 From: Peng Zhang To: , CC: , , , , , , , , , , , , Subject: [RFC PATCH 3/3] mm: convert mm's rss stats into lazy_percpu_counter Date: Fri, 12 Apr 2024 17:24:41 +0800 Message-ID: <20240412092441.3112481-4-zhangpeng362@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240412092441.3112481-1-zhangpeng362@huawei.com> References: <20240412092441.3112481-1-zhangpeng362@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.112.125] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemm600020.china.huawei.com (7.193.23.147) X-Stat-Signature: i5h4hiq8cn6h1tqfkrd1hs8q8gu4ncqj X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 75713180013 X-Rspam-User: X-HE-Tag: 1712913896-95647 X-HE-Meta: U2FsdGVkX18EEanbIj9QsZTPPzWbbdRZxla1fQFQy8G0o9QjaR53mw706V1dSYaMsTOsBrK22yQKpyn8Nfz4w9e9Pta72/yKba3G6Ksv83wBLHCkl8TH5lbBY8CAjVj2rhzAAhsTSsebIjJciChTyeQHpSwPDEzPsI04SdCV13cKhIduT0Y5UtlOgeiBJaB1JtigLUY6SCcElrhZNQzb4sQsr6tNwpeD3f8VIX0v0psw6+FDJ4JzT9QnUEvLAW9EQvsEWKt6HzJCkTKmIiSMxjLEMWKldHV4xglTw3veswjEPpocZFwcFhFvWEH7z/xMmXA4pwVMhQS09FSuY22E8QncZdlzgtZi4ou4KLNkAeT4/1sLpwHidwr4iA5IH33UOnuQKl+YFeLo2W4iZ8dL6Fn+A1RIrW5h9TGMOyhplxOhI0lUCBr8b6tZAvCiZ1dNOSseJIkM1BnMD2h5w8k9Ugsd1X4/Ig3EpN7gJydOAtYKl3pTy9RJ4d50BTqpi495AWrL0KU9X7PQEdzjozBFVukGmmwP9w3c0gy2mPgOT7wCNlo0JANbMs6hH3MFm6gJAALjYyN7/J7t4Z4fO9ux8YkeRkVjMsJ8Iu4dfd0Dsgm9VZeIIHHzhnYUAIk3S5TuNtoZGg47opSx5cY3dsC1qB7WbGEq6WG+WXEjbcmf2wr/sYM9/vOqQPp5RdJnmrA80wsReO/elJSst9QzFCV00nylOzsxEfDqhwVFfyl9esBleARrWntJS09AF70sI0xowzkWlkMmpJ/PRBPEePzx+sv9o2gR34NPkRLttjXE/oz9izLaFn5bC5qnhpIPDwEj2/zAIuz2ntUqtndrn0YXAW4GcdhXxFSZIGMTn4vm0mXtwtVj5BeZKU98SA2Ar4h0T22up69EF7wwnpLqFDo0Loec25OH8oiJYQ7kq1XR4arlwz9e8tvFBeM9gqKDWm9C3e1e9ggz6cq1g151Of7 Lc3XLQUM dqXk0xwx7JrVTBW4lDStZcbGgyJI3eh7lR9xNZL3V6TkJQ5wGMwS+BY4GDLlRUoHMcSjJ5MJpEePzuMgi5PuSLNyl2U4pgN7v3OgYdMWlz/uK70buXRTWfrx1oLm4BXNAM0fhlb7kZ25I8TDHkUikzLyL94eJJMS49lB/LSFDtytOL3g7PmbwnXvnuDzLTnu8LSgk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: ZhangPeng Since commit f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter"), the rss_stats have converted into percpu_counter, which convert the error margin from (nr_threads * 64) to approximately (nr_cpus ^ 2). However, the new percpu allocation in mm_init() causes a performance regression on fork/exec/shell. Even after commit 14ef95be6f55 ("kernel/fork: group allocation/free of per-cpu counters for mm struct"), the performance of fork/exec/shell is still poor compared to previous kernel versions. To mitigate performance regression, we use lazy_percpu_counter to delay the allocation of percpu memory for rss_stats. After lmbench test, we will get 3% ~ 6% performance improvement for lmbench fork_proc/exec_proc/ shell_proc after conversion. The test results are as follows: base base+revert base+lazy_percpu_counter fork_proc 427.4ms 394.1ms (7.8%) 413.9ms (3.2%) exec_proc 2205.1ms 2042.2ms (7.4%) 2072.0ms (6.0%) shell_proc 3180.9ms 2963.7ms (6.8%) 3010.7ms (5.4%) Signed-off-by: ZhangPeng Signed-off-by: Kefeng Wang --- include/linux/mm.h | 8 ++++---- include/linux/mm_types.h | 4 ++-- include/trace/events/kmem.h | 4 ++-- kernel/fork.c | 12 ++++-------- 4 files changed, 12 insertions(+), 16 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 07c73451d42f..d1ea246b99c3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2631,28 +2631,28 @@ static inline bool get_user_page_fast_only(unsigned long addr, */ static inline unsigned long get_mm_counter(struct mm_struct *mm, int member) { - return percpu_counter_read_positive(&mm->rss_stat[member]); + return lazy_percpu_counter_read_positive(&mm->rss_stat[member]); } void mm_trace_rss_stat(struct mm_struct *mm, int member); static inline void add_mm_counter(struct mm_struct *mm, int member, long value) { - percpu_counter_add(&mm->rss_stat[member], value); + lazy_percpu_counter_add(&mm->rss_stat[member], value); mm_trace_rss_stat(mm, member); } static inline void inc_mm_counter(struct mm_struct *mm, int member) { - percpu_counter_inc(&mm->rss_stat[member]); + lazy_percpu_counter_add(&mm->rss_stat[member], 1); mm_trace_rss_stat(mm, member); } static inline void dec_mm_counter(struct mm_struct *mm, int member) { - percpu_counter_dec(&mm->rss_stat[member]); + lazy_percpu_counter_sub(&mm->rss_stat[member], 1); mm_trace_rss_stat(mm, member); } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c432add95913..bf44c3a6fc99 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -18,7 +18,7 @@ #include #include #include -#include +#include #include @@ -898,7 +898,7 @@ struct mm_struct { unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */ - struct percpu_counter rss_stat[NR_MM_COUNTERS]; + struct lazy_percpu_counter rss_stat[NR_MM_COUNTERS]; struct linux_binfmt *binfmt; diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h index 6e62cc64cd92..3a35d9a665b7 100644 --- a/include/trace/events/kmem.h +++ b/include/trace/events/kmem.h @@ -399,8 +399,8 @@ TRACE_EVENT(rss_stat, __entry->mm_id = mm_ptr_to_hash(mm); __entry->curr = !!(current->mm == mm); __entry->member = member; - __entry->size = (percpu_counter_sum_positive(&mm->rss_stat[member]) - << PAGE_SHIFT); + __entry->size = (lazy_percpu_counter_sum_positive(&mm->rss_stat[member]) + << PAGE_SHIFT); ), TP_printk("mm_id=%u curr=%d type=%s size=%ldB", diff --git a/kernel/fork.c b/kernel/fork.c index 99076dbe27d8..0a4efb436030 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -823,7 +823,7 @@ static void check_mm(struct mm_struct *mm) "Please make sure 'struct resident_page_types[]' is updated as well"); for (i = 0; i < NR_MM_COUNTERS; i++) { - long x = percpu_counter_sum(&mm->rss_stat[i]); + long x = lazy_percpu_counter_sum(&mm->rss_stat[i]); if (unlikely(x)) pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld\n", @@ -910,6 +910,8 @@ static void cleanup_lazy_tlbs(struct mm_struct *mm) */ void __mmdrop(struct mm_struct *mm) { + int i; + BUG_ON(mm == &init_mm); WARN_ON_ONCE(mm == current->mm); @@ -924,7 +926,7 @@ void __mmdrop(struct mm_struct *mm) put_user_ns(mm->user_ns); mm_pasid_drop(mm); mm_destroy_cid(mm); - percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS); + lazy_percpu_counter_destroy_many(&mm->rss_stat[i], NR_MM_COUNTERS); free_mm(mm); } @@ -1301,16 +1303,10 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, if (mm_alloc_cid(mm)) goto fail_cid; - if (percpu_counter_init_many(mm->rss_stat, 0, GFP_KERNEL_ACCOUNT, - NR_MM_COUNTERS)) - goto fail_pcpu; - mm->user_ns = get_user_ns(user_ns); lru_gen_init_mm(mm); return mm; -fail_pcpu: - mm_destroy_cid(mm); fail_cid: destroy_context(mm); fail_nocontext: