From patchwork Mon Feb 20 09:16:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13146179 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71837C636CC for ; Mon, 20 Feb 2023 09:17:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EAC126B0073; Mon, 20 Feb 2023 04:17:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E5CB76B0074; Mon, 20 Feb 2023 04:17:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D245E6B0075; Mon, 20 Feb 2023 04:17:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C50366B0073 for ; Mon, 20 Feb 2023 04:17:13 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9F2C71C41F0 for ; Mon, 20 Feb 2023 09:17:13 +0000 (UTC) X-FDA: 80487116346.21.1AE36A6 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf12.hostedemail.com (Postfix) with ESMTP id C834C40002 for ; Mon, 20 Feb 2023 09:17:11 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=dzq60rTh; spf=pass (imf12.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676884631; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Hay0adYlVwY1IjdRfk5CKFeKCnG2WTWsVt3L/E4Vszc=; b=TsumGqMkVlIM10OEgPtG2xwfgLTlMcTce0fBo26caPpM4ehrqVdBhBGX64yCqfH30xx1cl Sb+/K1pxHv/H8hQCiQTzAEKXmILvDI4xvAvqIikEgAgpyHwmrEGR2DXBG688VBnlJQw9YX uvGFtC34BnV3L58zuI0JjhHNVfBjAMo= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=dzq60rTh; spf=pass (imf12.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676884631; a=rsa-sha256; cv=none; b=mMGLTk28gJIT9bBg+BEK8cy2nbYeGj8AUh82TC59PSiCCK36L1IFZ5HitvH+a3frUN7PRB xRz5Go00MuwMEzIywfCE+Kpg5FL/r4Jnc4gtXfDwtU9dHKqHPhwTMfh9Lml0L6scjqNXmJ Uf9M64gcgM87gk8k+twMn+4d3z0574w= Received: by mail-pl1-f171.google.com with SMTP id h14so754299plf.10 for ; Mon, 20 Feb 2023 01:17:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; t=1676884631; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Hay0adYlVwY1IjdRfk5CKFeKCnG2WTWsVt3L/E4Vszc=; b=dzq60rThnG1eHPZmbaqIAdKnvx11jcx+9c7+Xz+Fvb8jzUNuPvxrvvoH9rxiY7bseY MwBIqKmGnnRLy9oD7KNU0IP05L8avR9WGkGCuO0KBPHzABJt+tBiqxGIQYJRYUEQ7wdc +CCtYi1ncziYhQ42cmwAoqIuU1fou5tShKZ0YV8xXDToTIOwyrRgdIk32smyOhodVlyW 8GS5Jc0N9MoRt/38dCyYzVz+zoLJ6UwCIwnPMbV9kDCk/1kPdWTnY7K7PljNojAlUkBH R//DTbH1un7fzgI87WSVUXn8bEViUtn4XHEVe68NYl5dLaXaPf0wAFPKmBBOQiZAvtXk j0mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1676884631; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Hay0adYlVwY1IjdRfk5CKFeKCnG2WTWsVt3L/E4Vszc=; b=IgtvR5TWASmSL3UcDA2r9IBHVblUsay9FFe3+/UV0od7GzE0Y5LVAz+rhzS1aZuW6M jGl4dadIrpre+xnDIfd7eubfHNV9VWzCiphs2w3m/vsL3LUoLS5GfX6/N9ycs1yZhCJW 4u8Y7ZXqCFamADr+lc8Q7gQDG+ewni8+n8BgIYQzEa9FUF4c5yISIBEX3W4RmsGkskgD tRxtCynwWnYphBKMgChtDOZmUr5BlEvZ/K7HC/ie0ykIwF/HfA5j1GwwI+aUfrHi7L8f CGJ1Bh234dCI4N+yM8AWvj4O/GuvDFtUISGyOJsaPZXpdQFPqXRkAeM+Y7kRsL902n8e qJpQ== X-Gm-Message-State: AO0yUKVnquRSIXY2uqiryvoqiRYJWV+IFTU/xn+sjs14ujz0MVXO+8// 4RCStUQt8R1ErYUAJgR0WC1wMQ== X-Google-Smtp-Source: AK7set+RFNh5qhT4uR8QO+C6MsDJpFzqsI6Ran0kA2qMXuWKiMKjLinzDtbzLeMxRFj33Xvv1RqpTg== X-Received: by 2002:a17:90a:ac0a:b0:230:9ae4:b5e2 with SMTP id o10-20020a17090aac0a00b002309ae4b5e2mr566508pjq.0.1676884630706; Mon, 20 Feb 2023 01:17:10 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.229]) by smtp.gmail.com with ESMTPSA id fs13-20020a17090af28d00b002339195a47bsm432382pjb.53.2023.02.20.01.17.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Feb 2023 01:17:10 -0800 (PST) From: Qi Zheng To: akpm@linux-foundation.org, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com Cc: tkhai@ya.ru, sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH 2/5] mm: vmscan: make memcg slab shrink lockless Date: Mon, 20 Feb 2023 17:16:34 +0800 Message-Id: <20230220091637.64865-3-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230220091637.64865-1-zhengqi.arch@bytedance.com> References: <20230220091637.64865-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: sftgu3r3toz49tsphjhz983n8xgghb9q X-Rspamd-Queue-Id: C834C40002 X-HE-Tag: 1676884631-223200 X-HE-Meta: U2FsdGVkX19qhBR3q9ILE8X9Nu+AX3gHXMLu99N2jbZL8XpRH7JRhAaRzgp0nn0qOuAfPoOSeVdSpM6Ea+EBUTO8Es7qJTvBfmMcLU0QeiW9SJ9B9xrnDuu1MUFua2BUspSkT/xmK0wE1TSxSlOCyxtR6RNgfDoCgCXkUBI8KR32UdwkNB59wc5FOcmcvxQSDAevMLjcj4GtV5ru9FxBMyz2pHmmONtrmveH6DZYh5o/uHXEq0V/+xHW6GpRUrsXXIPLaBi13h8UX0B3md1MUOnotHyxnJ2GGqT6rNBcUk/8ohIiu4Z3wTMTIL8OskGlr6+8ERxH1QcSvEraG+CdBebi6A4J6Vcl/8LxYrdiPLzA7dg9fuuTBB5QNplgPifN/Qps5/lsZcNx0KyKz9GoyqUOQCvODkV8hX4GNvLhB3FFqa7mCcBhB1t3AYkcinvVK20VGN9xtVxcBv3cbd1+c+Rn8BjdkdM6jSrTtmn5zHTjJe8nBUWeiRZZ0YqWur8vDpvTUo2JXD+QF84FqvASNOkt8WnEWWVxYmRJN6NTzKCTMzeGTyzqDdneRDyv0hupxdolKXu/EpkFuEpsg60f3pQS3g6J6ceMe/0dRKogJjsYyX91GSI+ICOD/ugCjtYzCJcKVYCkS4+661y2TbdFLsd/JPMoeQ0ymuHHksB6hsPx71NW3AFcAyeEXNQapzN78UqMBkHL/4pKfAq1JD3d7dfb1zxzv5Tga/6Z2BnwMUvsySFStl/nhUvP3JXdX1BIcrKPMz/yOpN9EgkOrEiWLIb3D3XxSnJX9yE7yGCGXv+TJWJc+Ly0eCTGZpIpSzGLI8tvrGOq+exM/NmGjmVvilFSjIWTr/Zzs/BuIsJx8sLUo3wHig5V8NzPeS2m3LFHWInOZ9pirgipR2jQHPDHCnevCpIqw2a1p17Op/VptbC5tfygy9Wu3IUgf4TdKmJzPPElzJChj8VTebjO0my Gew0piio xwOrnTfVrXDzAn74elRee1TJPmGOSPiUJ8yrA4M+vVGzzwVCyaNeaEmHUBhGmT75R7DGNw1jvJmPoan18h8AhGLox+SUnGVQbOgvWrxPA0Fk382gilbvaYuaLFgx80BKfsiGq/lYV330wKLjTEpYGpTJoku97DvgaX4dwbWxOAd4EpSSSSKsOop50so1NrBLtMVqbbZKWuObbseOjW1wM+xF5Kk8AZMy9Egm5TRYyXeOSagTfmKOXnpCBC0R//PWWI5GMQegeS45q7KB/vMNPUx6pfxdjyJVBkYlebwEPGbGpvdc0ifNc/TFSFOOeFhVHWsieSVr7BFsCb8QxOB1kLQ+WOsk6OspnBuNLo4Fou3QzYCouibciV43fOTGpD8ZUTFgBNXsKipAOFCbh6k/WUu/AvSDtL6QzR8EyR4g4Ktct5boBvfydSAdDV0s8Kkw156z8eM/V5GtCGfXGLN1QE1SRLl3rNFvUb5qWKwo82ijTYRg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Like global slab shrink, since commit 1cd0bd06093c ("rcu: Remove CONFIG_SRCU"), it's time to use SRCU to protect readers who previously held shrinker_rwsem. We can test with the following script: ``` DIR="/root/shrinker/memcg/mnt" do_create() { mkdir /sys/fs/cgroup/memory/test echo 200M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes for i in `seq 0 $1`; do mkdir /sys/fs/cgroup/memory/test/$i; echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; mkdir -p $DIR/$i; done } do_mount() { for i in `seq $1 $2`; do mount -t tmpfs $i $DIR/$i; done } do_touch() { for i in `seq $1 $2`; do echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; dd if=/dev/zero of=$DIR/$i/file$i bs=1M count=1 & done } do_create 2000 do_mount 0 2000 do_touch 0 1000 ``` Before applying: 46.60% [kernel] [k] down_read_trylock 18.70% [kernel] [k] up_read 15.44% [kernel] [k] shrink_slab 4.37% [kernel] [k] _find_next_bit 2.75% [kernel] [k] xa_load 2.07% [kernel] [k] idr_find 1.73% [kernel] [k] do_shrink_slab 1.42% [kernel] [k] shrink_lruvec 0.74% [kernel] [k] shrink_node 0.60% [kernel] [k] list_lru_count_one After applying: 19.53% [kernel] [k] _find_next_bit 14.63% [kernel] [k] do_shrink_slab 14.58% [kernel] [k] shrink_slab 11.83% [kernel] [k] shrink_lruvec 9.33% [kernel] [k] __blk_flush_plug 6.67% [kernel] [k] mem_cgroup_iter 3.73% [kernel] [k] list_lru_count_one 2.43% [kernel] [k] shrink_node 1.96% [kernel] [k] super_cache_count 1.78% [kernel] [k] __rcu_read_unlock 1.38% [kernel] [k] __srcu_read_lock 1.30% [kernel] [k] xas_descend We can see that the readers is no longer blocked. Signed-off-by: Qi Zheng --- mm/vmscan.c | 56 ++++++++++++++++++++++++++++++----------------------- 1 file changed, 32 insertions(+), 24 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 95a3d6ddc6c1..dc47396ecd0e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -57,6 +57,7 @@ #include #include #include +#include #include #include @@ -221,8 +222,21 @@ static inline int shrinker_defer_size(int nr_items) static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg, int nid) { - return rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info, - lockdep_is_held(&shrinker_rwsem)); + return srcu_dereference_check(memcg->nodeinfo[nid]->shrinker_info, + &shrinker_srcu, + lockdep_is_held(&shrinker_rwsem)); +} + +static struct shrinker_info *shrinker_info_srcu(struct mem_cgroup *memcg, + int nid) +{ + return srcu_dereference(memcg->nodeinfo[nid]->shrinker_info, + &shrinker_srcu); +} + +static void free_shrinker_info_rcu(struct rcu_head *head) +{ + kvfree(container_of(head, struct shrinker_info, rcu)); } static int expand_one_shrinker_info(struct mem_cgroup *memcg, @@ -257,7 +271,7 @@ static int expand_one_shrinker_info(struct mem_cgroup *memcg, defer_size - old_defer_size); rcu_assign_pointer(pn->shrinker_info, new); - kvfree_rcu(old, rcu); + call_srcu(&shrinker_srcu, &old->rcu, free_shrinker_info_rcu); } return 0; @@ -350,13 +364,14 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id) { if (shrinker_id >= 0 && memcg && !mem_cgroup_is_root(memcg)) { struct shrinker_info *info; + int srcu_idx; - rcu_read_lock(); - info = rcu_dereference(memcg->nodeinfo[nid]->shrinker_info); + srcu_idx = srcu_read_lock(&shrinker_srcu); + info = shrinker_info_srcu(memcg, nid); /* Pairs with smp mb in shrink_slab() */ smp_mb__before_atomic(); set_bit(shrinker_id, info->map); - rcu_read_unlock(); + srcu_read_unlock(&shrinker_srcu, srcu_idx); } } @@ -370,7 +385,6 @@ static int prealloc_memcg_shrinker(struct shrinker *shrinker) return -ENOSYS; down_write(&shrinker_rwsem); - /* This may call shrinker, so it must use down_read_trylock() */ id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL); if (id < 0) goto unlock; @@ -404,7 +418,7 @@ static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker, { struct shrinker_info *info; - info = shrinker_info_protected(memcg, nid); + info = shrinker_info_srcu(memcg, nid); return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0); } @@ -413,13 +427,13 @@ static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker, { struct shrinker_info *info; - info = shrinker_info_protected(memcg, nid); + info = shrinker_info_srcu(memcg, nid); return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]); } void reparent_shrinker_deferred(struct mem_cgroup *memcg) { - int i, nid; + int i, nid, srcu_idx; long nr; struct mem_cgroup *parent; struct shrinker_info *child_info, *parent_info; @@ -429,16 +443,16 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg) parent = root_mem_cgroup; /* Prevent from concurrent shrinker_info expand */ - down_read(&shrinker_rwsem); + srcu_idx = srcu_read_lock(&shrinker_srcu); for_each_node(nid) { - child_info = shrinker_info_protected(memcg, nid); - parent_info = shrinker_info_protected(parent, nid); + child_info = shrinker_info_srcu(memcg, nid); + parent_info = shrinker_info_srcu(parent, nid); for (i = 0; i < shrinker_nr_max; i++) { nr = atomic_long_read(&child_info->nr_deferred[i]); atomic_long_add(nr, &parent_info->nr_deferred[i]); } } - up_read(&shrinker_rwsem); + srcu_read_unlock(&shrinker_srcu, srcu_idx); } static bool cgroup_reclaim(struct scan_control *sc) @@ -891,15 +905,14 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, { struct shrinker_info *info; unsigned long ret, freed = 0; + int srcu_idx; int i; if (!mem_cgroup_online(memcg)) return 0; - if (!down_read_trylock(&shrinker_rwsem)) - return 0; - - info = shrinker_info_protected(memcg, nid); + srcu_idx = srcu_read_lock(&shrinker_srcu); + info = shrinker_info_srcu(memcg, nid); if (unlikely(!info)) goto unlock; @@ -949,14 +962,9 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, set_shrinker_bit(memcg, nid, i); } freed += ret; - - if (rwsem_is_contended(&shrinker_rwsem)) { - freed = freed ? : 1; - break; - } } unlock: - up_read(&shrinker_rwsem); + srcu_read_unlock(&shrinker_srcu, srcu_idx); return freed; } #else /* CONFIG_MEMCG */