From patchwork Tue Mar 7 06:55:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13162867 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33C36C678D5 for ; Tue, 7 Mar 2023 06:56:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C5DA06B0073; Tue, 7 Mar 2023 01:56:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C0E026B0074; Tue, 7 Mar 2023 01:56:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD594280001; Tue, 7 Mar 2023 01:56:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9E7466B0073 for ; Tue, 7 Mar 2023 01:56:39 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 64438A0A7C for ; Tue, 7 Mar 2023 06:56:39 +0000 (UTC) X-FDA: 80541194118.14.82D26A0 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by imf07.hostedemail.com (Postfix) with ESMTP id 7FF3940017 for ; Tue, 7 Mar 2023 06:56:37 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=gwtF2j6j; spf=pass (imf07.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678172197; a=rsa-sha256; cv=none; b=Q2vT5S2E8odBM7KziIiFh48/MVmqx49WEJYY4KycVjJGmGDP6s33RLGNjQlwFO0U+WZWUC Eu6AKHwcfmi1Bf1uFto1iv4WtuDvLcceacsTSyElasz3JhGbvxiXeMi/5V9olcTu1EJRAQ ZhWmxoIozdjFyEu9YXFC28ByF7zk7Og= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=gwtF2j6j; spf=pass (imf07.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678172197; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ASSD7BghNw/J/hUaiXgWQCknh8Fl8R4TjAxXoZe8rSo=; b=BG2usaR8bt6MPpaxKzaLO+Zv2JSu5iutDN+uGp3AKQWxWZ+4jeVyv3HiCJgUILkQMGGQlC z6VB5EVbbyf+EY5T498a1R30wGasU8UiGK8i/6ECbuMqgG7cVrin3+vDMTWZYLUHd9OV8K BA/OiS5aNoslyvKNmkvh2L9JRuOBWVU= Received: by mail-pf1-f171.google.com with SMTP id ce7so7391848pfb.9 for ; Mon, 06 Mar 2023 22:56:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1678172196; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ASSD7BghNw/J/hUaiXgWQCknh8Fl8R4TjAxXoZe8rSo=; b=gwtF2j6jEh4RIfI6JeZzkRF/pMyUNSc09ILbLx3nVVaFUO9uEdhj33JRDIszQXQGC7 tFr1nfLXPXzxu61Qs4pZfvZ+lgsXotnOK4kxSA6ErTDJ3W9Yv9udOYJFj30nHSvBFyCK 6zXZzdciINciHjYiRTh9V7nvLCLygsVJ/+eAkyvaE7v2gDCPhNda8nR9vHnymWdta9XX nom7SbNGo2amnoXsU0yzWSGLVguPni1/Zb7foDqZHxvqZtJ3jEqF2qkhQsumFjGyA+Oj mW773gxFLzs4+Pd46Q3RYSLslTYbgRVmbhEIspK60CCkH26JKY6i5r6Y+hVwWdew41+A rdQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678172196; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ASSD7BghNw/J/hUaiXgWQCknh8Fl8R4TjAxXoZe8rSo=; b=A8wk/DMibdp1wrViE+eP/K54fhbVwUyTVadN4O8ABvbOTBbJugY756SIPQ7I5rxV7A G4IIgCqXqnZSlJ5BUaArvuQxMzoUFUHSmjpw/mgbxbZm8XOxwci9KHYJa1mvu9mDWHlE lEiEDeJ6YDhmsrIXMHLxLsROFfMHMWiD2xEN7P1647zZGxlltRZQeCT+nsIJNZs+wxvL AEpYvNUnmDjhP2Q2W86ZTWTTwlD5UmWaNsxtPiaQI06S877acOq+QoZrxY5+IrH/cvcK IwJF02Aglz8XuSVcW1aWzllg98Qa3fsnu9pO6apbVlj/QVUkOrbKlzFrTjCPGaXQHhkr Rj9w== X-Gm-Message-State: AO0yUKXym8z3tH/qws2w3hSJ2u7+wpLAWgTQfp7cTxH/kgFpbqD61M56 hPSE0MgEJXWXFyl4M+OMSQj6FQ== X-Google-Smtp-Source: AK7set8ycLeV4ug3Hh8JCVS9nXkkF3OpTneeqDALEmgAnAPJ0FJlAs/ofqR1TK404Yuh4hIX7jHwLw== X-Received: by 2002:a05:6a00:2d89:b0:5e2:3086:f977 with SMTP id fb9-20020a056a002d8900b005e23086f977mr13567615pfb.2.1678172196363; Mon, 06 Mar 2023 22:56:36 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.245]) by smtp.gmail.com with ESMTPSA id e5-20020a62ee05000000b00608dae58695sm7230854pfi.209.2023.03.06.22.56.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Mar 2023 22:56:35 -0800 (PST) From: Qi Zheng To: akpm@linux-foundation.org, tkhai@ya.ru, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com, rppt@kernel.org Cc: sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v4 2/8] mm: vmscan: make global slab shrink lockless Date: Tue, 7 Mar 2023 14:55:59 +0800 Message-Id: <20230307065605.58209-3-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230307065605.58209-1-zhengqi.arch@bytedance.com> References: <20230307065605.58209-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 7FF3940017 X-Rspamd-Server: rspam01 X-Stat-Signature: ngqyququqbwqagt6643nkxo1gitunchc X-HE-Tag: 1678172197-42126 X-HE-Meta: U2FsdGVkX1/z7maYK+KYA/o2jmpjZaA2XX7eY+ZSuLzkzTIG6b1/N1CVgrY5k6oqUCzPg/yWkuA+YqVFIiJJ7ZMuf7ZmR/Yh0aXhD8LsIZYe/RMsS4ATQvIHklvdj8aMIxtua0tr0fR37u35/IRmhYkA1HYUw0hi+xvPhz/SA7/pxhlNe9dgUrVGrGE6udxH6Zc5prLk4VogAtsQFHSef9AJSXeUo6sVm8WJuGOEAvD8oDfBAYpTMZq1+9vzWUES788e9yT4JQMSX7zMDjt7V5lMB2UwpZTQuvYXsmaR5AEaerU2fZpgR5tQYMAYFHG0djzItxhkpyJn+5qFj+yvOeVcarYhJ8jhBkijsF9wwHT5AIzdUtI463B76j9TX6Dt92HfJBaCwxwS4n3ME86z56q40R9qXsmgIddA7tdnkbxwik6ZdEm+iGitnYlO1qTIVKNr5GoTnQPxzydjkRHihp0FLt66zJ+QnaJaRz9zxlotWM7jf25eD6mHjrC+OwOy2tiyaRA4wDVCjxaNFLNYgPyE7+kEcmjWt2szAXLdKXbgMYSxUAZ4bl7wlBzt4NumPFf83BTyEZg5v8cvJEADVkpiyMfx/EbIZmFWsxWRmmw8M1B/+27BOxudVN6ZHdmJeYPr9s+ntq2QUcoB5BXO3P/JRNoH10hFoFHhKHAUI3YJDsZ3GUtgUAasYCHs506OlLMIpha13abCEezaLmQdFPJNBHKEQRPOOMksoXm8rg9hd+ITYrRJmP/Y1j/CDd293nExS48JuSbq5kJVOPAUkhuyIFXDjQ/FV3KL+Tua2ZMTd76E+c4WbwtV8Y616ZlzyYxHLFHItw0cfvBrXrbcbr2gJYU/vwpanpzU7l5h1p3szK+qys8dPsjn+YJW8UHjhUPTemwpBbJPBlJRiu9aXM2EmGLPe060OsVQ+vuvMITli0nclUkDGNNMYKfezpow33FEj4HrRrbyGCtBvdC 9xt52HaW +DcTwKR8FRbkcnVCmLPyOG1N0FzZBCdBJrX8FBKktL9HT3l1MOXEy0b7nN1YVSCdLfjPgZ6HpI4/59bJiXn7Y21Dup/Db+qEjKZjbu9jsJ392OE66QlxzgkHhuIH4q781WwEXSRO939JuZ8uByMmL2kRSHs2CDBgDfJyZ/SKRHzN4uk63fsvtDOi5ysBHG7XA4wkpb8cL1W7LNiTRnWzmFKfQmKXxTdQT3iQRqcRZNRr1IuFYGWvh3UmJPfl6QQuWDWLfB4hYuCuWDqxXKu146bgcv26O8igEwKuDMrn3uHmNNFn457Khk4FcjB3Jy2jpM27R7oPVE3cdRjGzOx5eWhGQ53hdNhNUwB4Ivu/kzaRnHbWRVMRwd3kucJMLGMzOwya6L4C/tN1LwSKIwdCQ0u24wcx/fr6enYJgnlHYMXyrZbT/OC5yQtaTSLERmIWgce9mKcq87suy679ExwWbrSxMgdGHNi8678sb6vgpfMc36yHYBZ5+ULZTRHqnOGmIq0QJV5qyalWHwL/smV6GknTq9FMxPAUAkIZW172cH6FMcO3bRzP5BoYTu5cJ9EUvNgNa3gTT7Dh/JaaEfWbAUslyf2ERLFqnigZlaxkbLpnq6LYjDv0PafH/tzwc6akQbr7Hqr+D4/eM0rXHZ/dSDJnLyt0A3p7xp8U3 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The shrinker_rwsem is a global read-write lock in shrinkers subsystem, which protects most operations such as slab shrink, registration and unregistration of shrinkers, etc. This can easily cause problems in the following cases. 1) When the memory pressure is high and there are many filesystems mounted or unmounted at the same time, slab shrink will be affected (down_read_trylock() failed). Such as the real workload mentioned by Kirill Tkhai: ``` One of the real workloads from my experience is start of an overcommitted node containing many starting containers after node crash (or many resuming containers after reboot for kernel update). In these cases memory pressure is huge, and the node goes round in long reclaim. ``` 2) If a shrinker is blocked (such as the case mentioned in [1]) and a writer comes in (such as mount a fs), then this writer will be blocked and cause all subsequent shrinker-related operations to be blocked. Even if there is no competitor when shrinking slab, there may still be a problem. If we have a long shrinker list and we do not reclaim enough memory with each shrinker, then the down_read_trylock() may be called with high frequency. Because of the poor multicore scalability of atomic operations, this can lead to a significant drop in IPC (instructions per cycle). So many times in history ([2],[3],[4],[5]), some people wanted to replace shrinker_rwsem trylock with SRCU in the slab shrink, but all these patches were abandoned because SRCU was not unconditionally enabled. But now, since commit 1cd0bd06093c ("rcu: Remove CONFIG_SRCU"), the SRCU is unconditionally enabled. So it's time to use SRCU to protect readers who previously held shrinker_rwsem. This commit uses SRCU to make global slab shrink lockless, the memcg slab shrink is handled in the subsequent patch. [1]. https://lore.kernel.org/lkml/20191129214541.3110-1-ptikhomirov@virtuozzo.com/ [2]. https://lore.kernel.org/all/1437080113.3596.2.camel@stgolabs.net/ [3]. https://lore.kernel.org/lkml/1510609063-3327-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp/ [4]. https://lore.kernel.org/lkml/153365347929.19074.12509495712735843805.stgit@localhost.localdomain/ [5]. https://lore.kernel.org/lkml/20210927074823.5825-1-sultan@kerneltoast.com/ Signed-off-by: Qi Zheng Acked-by: Vlastimil Babka Acked-by: Kirill Tkhai --- mm/vmscan.c | 27 +++++++++++---------------- 1 file changed, 11 insertions(+), 16 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2dcc01682026..8515ac40bcaf 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -202,6 +202,7 @@ static void set_task_reclaim_state(struct task_struct *task, LIST_HEAD(shrinker_list); DECLARE_RWSEM(shrinker_rwsem); +DEFINE_SRCU(shrinker_srcu); #ifdef CONFIG_MEMCG static int shrinker_nr_max; @@ -706,7 +707,7 @@ void free_prealloced_shrinker(struct shrinker *shrinker) void register_shrinker_prepared(struct shrinker *shrinker) { down_write(&shrinker_rwsem); - list_add_tail(&shrinker->list, &shrinker_list); + list_add_tail_rcu(&shrinker->list, &shrinker_list); shrinker->flags |= SHRINKER_REGISTERED; shrinker_debugfs_add(shrinker); up_write(&shrinker_rwsem); @@ -760,13 +761,15 @@ void unregister_shrinker(struct shrinker *shrinker) return; down_write(&shrinker_rwsem); - list_del(&shrinker->list); + list_del_rcu(&shrinker->list); shrinker->flags &= ~SHRINKER_REGISTERED; if (shrinker->flags & SHRINKER_MEMCG_AWARE) unregister_memcg_shrinker(shrinker); debugfs_entry = shrinker_debugfs_remove(shrinker); up_write(&shrinker_rwsem); + synchronize_srcu(&shrinker_srcu); + debugfs_remove_recursive(debugfs_entry); kfree(shrinker->nr_deferred); @@ -786,6 +789,7 @@ void synchronize_shrinkers(void) { down_write(&shrinker_rwsem); up_write(&shrinker_rwsem); + synchronize_srcu(&shrinker_srcu); } EXPORT_SYMBOL(synchronize_shrinkers); @@ -996,6 +1000,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, { unsigned long ret, freed = 0; struct shrinker *shrinker; + int srcu_idx; /* * The root memcg might be allocated even though memcg is disabled @@ -1007,10 +1012,10 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) return shrink_slab_memcg(gfp_mask, nid, memcg, priority); - if (!down_read_trylock(&shrinker_rwsem)) - goto out; + srcu_idx = srcu_read_lock(&shrinker_srcu); - list_for_each_entry(shrinker, &shrinker_list, list) { + list_for_each_entry_srcu(shrinker, &shrinker_list, list, + srcu_read_lock_held(&shrinker_srcu)) { struct shrink_control sc = { .gfp_mask = gfp_mask, .nid = nid, @@ -1021,19 +1026,9 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, if (ret == SHRINK_EMPTY) ret = 0; freed += ret; - /* - * Bail out if someone want to register a new shrinker to - * prevent the registration from being stalled for long periods - * by parallel ongoing shrinking. - */ - if (rwsem_is_contended(&shrinker_rwsem)) { - freed = freed ? : 1; - break; - } } - up_read(&shrinker_rwsem); -out: + srcu_read_unlock(&shrinker_srcu, srcu_idx); cond_resched(); return freed; }