From patchwork Fri Apr 29 13:35:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 12832007 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 945B8C433F5 for ; Fri, 29 Apr 2022 13:36:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 341636B0074; Fri, 29 Apr 2022 09:36:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F1FE6B0075; Fri, 29 Apr 2022 09:36:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1937D6B0078; Fri, 29 Apr 2022 09:36:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 0B0326B0074 for ; Fri, 29 Apr 2022 09:36:29 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id C67DF60434 for ; Fri, 29 Apr 2022 13:36:28 +0000 (UTC) X-FDA: 79410016056.30.8BF8528 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf30.hostedemail.com (Postfix) with ESMTP id 4222D80071 for ; Fri, 29 Apr 2022 13:36:19 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id l11-20020a17090a49cb00b001d923a9ca99so7295629pjm.1 for ; Fri, 29 Apr 2022 06:36:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=TMZmZ4AGQljuOcKnvNnlPkQjL1BDLadEZvUt3w1Anw8=; b=iNbXqvglmSrqfFNb3nFzL0dKEc6ia3WcRkU7xNMP/tHBQSotTfWchC6ouLupv7gKG6 mjLWcX7CspmW80y0Qeb/TyGyZZi5yxh/EQGlKJWLIMropwa6Iv2OOK9jr2sYgcXxprZ5 UEr8ImcwooRjwwOIzUNVOu/4dBlKsJNZb0ph4n9C1AkUYJhYk1T7s4n+e/UvC/IBJXiH lelZrCLgW31cqCdXshBxLBAl5D20bnxmQwk5IutcdRzkuh3QaA7GJQaxil8805VZ+bKL LtZQu7le0S6cehGWwU6lFsz62Ehg8Ne/le2UyAqsY2z4SBsgcZzowDkcSOqKwQ72hUWd Ry/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TMZmZ4AGQljuOcKnvNnlPkQjL1BDLadEZvUt3w1Anw8=; b=FgqgNk5Wt6/4v/fNIgeJINFpNO5E8Kt7o7oglEJv3I2O6CMFn5urB0Ce2lobDR8CtS q5Tx5DgB8XXTMGjt9GygGKvljTncFpf//kjibr2UE9IqSobId5ie8SkOYlfmCj5dbYR3 vTPIUhRlb6pOGNTp0oO2BbR0+dooUnRxcMDNJeEboAu/fq4nDqvgFxUyrECcz20XTn6s kxmOaSlRNYfqO3DqP2M9OkqVyEwsnKZPzd3CWhzzmJe13rD35trZzOC0P/TDIl+ypzGA dg9wTlyGvwJSA1vL1+1Ku2WFjV3/Buk5HFpKZRS+pFWmVTuQep0OW9m13YvpvYi+l6qa 3xaw== X-Gm-Message-State: AOAM531gMogyq9TDMy4VBMqjz+RGeS6+nMYmegR7jJiunkkyl5a4StO/ Kv+QLldu8UQa1Md2xzVvS0s0og== X-Google-Smtp-Source: ABdhPJxpxUNyJ4Q8PzIgoCFgG7OivbK5xNlpIRA+PWmaJAKluGlH8xiPo6kD1HPbaMTFamIM9Vs/pw== X-Received: by 2002:a17:90b:3a89:b0:1d9:b448:a932 with SMTP id om9-20020a17090b3a8900b001d9b448a932mr3941757pjb.173.1651239387248; Fri, 29 Apr 2022 06:36:27 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.240]) by smtp.gmail.com with ESMTPSA id m8-20020a17090a414800b001d81a30c437sm10681977pjg.50.2022.04.29.06.36.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Apr 2022 06:36:26 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, tglx@linutronix.de, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, david@redhat.com, jgg@nvidia.com, tj@kernel.org, dennis@kernel.org, ming.lei@redhat.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, songmuchun@bytedance.com, zhouchengming@bytedance.com, Qi Zheng Subject: [RFC PATCH 02/18] percpu_ref: make ref stable after percpu_ref_switch_to_atomic_sync() returns Date: Fri, 29 Apr 2022 21:35:36 +0800 Message-Id: <20220429133552.33768-3-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20220429133552.33768-1-zhengqi.arch@bytedance.com> References: <20220429133552.33768-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 4222D80071 X-Stat-Signature: jgyhnypsf31eqz71du7bh5utzum5xbng X-Rspam-User: Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=iNbXqvgl; spf=pass (imf30.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspamd-Server: rspam09 X-HE-Tag: 1651239379-403477 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In the percpu_ref_call_confirm_rcu(), we call the wake_up_all() before calling percpu_ref_put(), which will cause the value of percpu_ref to be unstable when percpu_ref_switch_to_atomic_sync() returns. CPU0 CPU1 percpu_ref_switch_to_atomic_sync(&ref) --> percpu_ref_switch_to_atomic(&ref) --> percpu_ref_get(ref); /* put after confirmation */ call_rcu(&ref->data->rcu, percpu_ref_switch_to_atomic_rcu); percpu_ref_switch_to_atomic_rcu --> percpu_ref_call_confirm_rcu --> data->confirm_switch = NULL; wake_up_all(&percpu_ref_switch_waitq); /* here waiting to wake up */ wait_event(percpu_ref_switch_waitq, !ref->data->confirm_switch); (A)percpu_ref_put(ref); /* The value of &ref is unstable! */ percpu_ref_is_zero(&ref) (B)percpu_ref_put(ref); As shown above, assuming that the counts on each cpu add up to 0 before calling percpu_ref_switch_to_atomic_sync(), we expect that after switching to atomic mode, percpu_ref_is_zero() can return true. But actually it will return different values in the two cases of A and B, which is not what we expected. Now there are two users of percpu_ref_switch_to_atomic_sync() in the kernel: i. mddev->writes_pending in the driver/md/md.c ii. q->q_usage_counter in the block/blk-pm.c And they are all used as shown above. In the worst case, percpu_ref_is_zero() may not hold because of the case B every time. While this is unlikely to occur in a production environment, it is a problem. This patch moves percpu_ref_put() out of the rcu handler and call it after wait_event(), which can makes ref stable after percpu_ref_switch_to_atomic_sync() returns. Then in the example above, percpu_ref_is_zero() can see a steady 0 value, which is what we would expect. Signed-off-by: Qi Zheng --- include/linux/percpu-refcount.h | 4 ++- lib/percpu-refcount.c | 56 +++++++++++++++++++++++---------- 2 files changed, 43 insertions(+), 17 deletions(-) diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h index d73a1c08c3e3..75844939a965 100644 --- a/include/linux/percpu-refcount.h +++ b/include/linux/percpu-refcount.h @@ -98,6 +98,7 @@ struct percpu_ref_data { percpu_ref_func_t *confirm_switch; bool force_atomic:1; bool allow_reinit:1; + bool sync:1; struct rcu_head rcu; struct percpu_ref *ref; }; @@ -123,7 +124,8 @@ int __must_check percpu_ref_init(struct percpu_ref *ref, gfp_t gfp); void percpu_ref_exit(struct percpu_ref *ref); void percpu_ref_switch_to_atomic(struct percpu_ref *ref, - percpu_ref_func_t *confirm_switch); + percpu_ref_func_t *confirm_switch, + bool sync); void percpu_ref_switch_to_atomic_sync(struct percpu_ref *ref); void percpu_ref_switch_to_percpu(struct percpu_ref *ref); void percpu_ref_kill_and_confirm(struct percpu_ref *ref, diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index af9302141bcf..3a8906715e09 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -99,6 +99,7 @@ int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release, data->release = release; data->confirm_switch = NULL; data->ref = ref; + data->sync = false; ref->data = data; return 0; } @@ -146,21 +147,33 @@ void percpu_ref_exit(struct percpu_ref *ref) } EXPORT_SYMBOL_GPL(percpu_ref_exit); +static inline void percpu_ref_switch_to_atomic_post(struct percpu_ref *ref) +{ + struct percpu_ref_data *data = ref->data; + + if (!data->allow_reinit) + __percpu_ref_exit(ref); + + /* drop ref from percpu_ref_switch_to_atomic() */ + percpu_ref_put(ref); +} + static void percpu_ref_call_confirm_rcu(struct rcu_head *rcu) { struct percpu_ref_data *data = container_of(rcu, struct percpu_ref_data, rcu); struct percpu_ref *ref = data->ref; + bool need_put = true; + + if (data->sync) + need_put = data->sync = false; data->confirm_switch(ref); data->confirm_switch = NULL; wake_up_all(&percpu_ref_switch_waitq); - if (!data->allow_reinit) - __percpu_ref_exit(ref); - - /* drop ref from percpu_ref_switch_to_atomic() */ - percpu_ref_put(ref); + if (need_put) + percpu_ref_switch_to_atomic_post(ref); } static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu) @@ -210,14 +223,19 @@ static void percpu_ref_noop_confirm_switch(struct percpu_ref *ref) } static void __percpu_ref_switch_to_atomic(struct percpu_ref *ref, - percpu_ref_func_t *confirm_switch) + percpu_ref_func_t *confirm_switch, + bool sync) { if (ref->percpu_count_ptr & __PERCPU_REF_ATOMIC) { if (confirm_switch) confirm_switch(ref); + if (sync) + percpu_ref_get(ref); return; } + ref->data->sync = sync; + /* switching from percpu to atomic */ ref->percpu_count_ptr |= __PERCPU_REF_ATOMIC; @@ -232,13 +250,16 @@ static void __percpu_ref_switch_to_atomic(struct percpu_ref *ref, call_rcu(&ref->data->rcu, percpu_ref_switch_to_atomic_rcu); } -static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref) +static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref, bool sync) { unsigned long __percpu *percpu_count = percpu_count_ptr(ref); int cpu; BUG_ON(!percpu_count); + if (sync) + percpu_ref_get(ref); + if (!(ref->percpu_count_ptr & __PERCPU_REF_ATOMIC)) return; @@ -261,7 +282,8 @@ static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref) } static void __percpu_ref_switch_mode(struct percpu_ref *ref, - percpu_ref_func_t *confirm_switch) + percpu_ref_func_t *confirm_switch, + bool sync) { struct percpu_ref_data *data = ref->data; @@ -276,9 +298,9 @@ static void __percpu_ref_switch_mode(struct percpu_ref *ref, percpu_ref_switch_lock); if (data->force_atomic || percpu_ref_is_dying(ref)) - __percpu_ref_switch_to_atomic(ref, confirm_switch); + __percpu_ref_switch_to_atomic(ref, confirm_switch, sync); else - __percpu_ref_switch_to_percpu(ref); + __percpu_ref_switch_to_percpu(ref, sync); } /** @@ -302,14 +324,15 @@ static void __percpu_ref_switch_mode(struct percpu_ref *ref, * switching to atomic mode, this function can be called from any context. */ void percpu_ref_switch_to_atomic(struct percpu_ref *ref, - percpu_ref_func_t *confirm_switch) + percpu_ref_func_t *confirm_switch, + bool sync) { unsigned long flags; spin_lock_irqsave(&percpu_ref_switch_lock, flags); ref->data->force_atomic = true; - __percpu_ref_switch_mode(ref, confirm_switch); + __percpu_ref_switch_mode(ref, confirm_switch, sync); spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); } @@ -325,8 +348,9 @@ EXPORT_SYMBOL_GPL(percpu_ref_switch_to_atomic); */ void percpu_ref_switch_to_atomic_sync(struct percpu_ref *ref) { - percpu_ref_switch_to_atomic(ref, NULL); + percpu_ref_switch_to_atomic(ref, NULL, true); wait_event(percpu_ref_switch_waitq, !ref->data->confirm_switch); + percpu_ref_switch_to_atomic_post(ref); } EXPORT_SYMBOL_GPL(percpu_ref_switch_to_atomic_sync); @@ -355,7 +379,7 @@ void percpu_ref_switch_to_percpu(struct percpu_ref *ref) spin_lock_irqsave(&percpu_ref_switch_lock, flags); ref->data->force_atomic = false; - __percpu_ref_switch_mode(ref, NULL); + __percpu_ref_switch_mode(ref, NULL, false); spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); } @@ -390,7 +414,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref, ref->data->release); ref->percpu_count_ptr |= __PERCPU_REF_DEAD; - __percpu_ref_switch_mode(ref, confirm_kill); + __percpu_ref_switch_mode(ref, confirm_kill, false); percpu_ref_put(ref); spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); @@ -470,7 +494,7 @@ void percpu_ref_resurrect(struct percpu_ref *ref) ref->percpu_count_ptr &= ~__PERCPU_REF_DEAD; percpu_ref_get(ref); - __percpu_ref_switch_mode(ref, NULL); + __percpu_ref_switch_mode(ref, NULL, false); spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); }