From patchwork Thu Mar 30 09:45:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?5Luj5a2Q5Li6IChaaXdlaSBEYWkp?= X-Patchwork-Id: 13193736 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36C98C761A6 for ; Thu, 30 Mar 2023 09:46:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230052AbjC3Jqx (ORCPT ); Thu, 30 Mar 2023 05:46:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230303AbjC3JqS (ORCPT ); Thu, 30 Mar 2023 05:46:18 -0400 Received: from SHSQR01.spreadtrum.com (unknown [222.66.158.135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76C4C8A49 for ; Thu, 30 Mar 2023 02:45:54 -0700 (PDT) Received: from SHSend.spreadtrum.com (bjmbx02.spreadtrum.com [10.0.64.8]) by SHSQR01.spreadtrum.com with ESMTP id 32U9jMnr006802; Thu, 30 Mar 2023 17:45:22 +0800 (+08) (envelope-from Ziwei.Dai@unisoc.com) Received: from BJMBX01.spreadtrum.com (10.0.64.7) by BJMBX02.spreadtrum.com (10.0.64.8) with Microsoft SMTP Server (TLS) id 15.0.1497.23; Thu, 30 Mar 2023 17:45:19 +0800 Received: from BJMBX01.spreadtrum.com ([fe80::54e:9a:129d:fac7]) by BJMBX01.spreadtrum.com ([fe80::54e:9a:129d:fac7%16]) with mapi id 15.00.1497.023; Thu, 30 Mar 2023 17:45:19 +0800 From: =?utf-8?b?5Luj5a2Q5Li6IChaaXdlaSBEYWkp?= To: "paulmck@kernel.org" , "frederic@kernel.org" , "quic_neeraju@quicinc.com" , "josh@joshtriplett.org" , "rostedt@goodmis.org" , "mathieu.desnoyers@efficios.com" , "jiangshanlai@gmail.com" , "joel@joelfernandes.org" , "rcu@vger.kernel.org" CC: "linux-kernel@vger.kernel.org" , =?utf-8?b?546L5Y+MIChTaHVhbmcgV2FuZyk=?= , =?utf-8?b?6L6b5L6d5YehIChZaWZhbiBYaW4p?= , =?utf-8?b?546L56eRIChLZSBXYW5nKQ==?= , =?utf-8?b?6Zer?= =?utf-8?b?5a2m5paHIChYdWV3ZW4gWWFuKQ==?= , =?utf-8?b?54mb5b+X5Zu9IChaaGlndW8gTml1KQ==?= , =?utf-8?b?6buE5pyd6ZizIChaaGFveWFuZyBIdWFuZyk=?= Subject: =?utf-8?q?=E7=AD=94=E5=A4=8D=3A_=5BPATCH=5D_rcu=3A_Make_sure_new_kr?= =?utf-8?q?cp_free_business_is_handled_after_the_wanted_rcu_grace_period=2E?= Thread-Topic: [PATCH] rcu: Make sure new krcp free business is handled after the wanted rcu grace period. Thread-Index: AQHZYunZ/jhXySNYOUSGqcpNejo07q8TEDSg Date: Thu, 30 Mar 2023 09:45:19 +0000 Message-ID: <76db405712174b20a1caef47acd1cda7@BJMBX01.spreadtrum.com> References: <1680168440-20511-1-git-send-email-ziwei.dai@unisoc.com> In-Reply-To: <1680168440-20511-1-git-send-email-ziwei.dai@unisoc.com> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.0.93.65] MIME-Version: 1.0 X-MAIL: SHSQR01.spreadtrum.com 32U9jMnr006802 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org Hi Uladzislau and all, Sorry for the disclaimer in the original mail. Please help comment in this new thread. We found this issue at K5.15. We try to fix this issue on K5.15. It seems mainline also has this issue. Below is the first debug patch on k5.15 device, which is under stress test, issue not reproduce so far. ============================================================ > -----邮件原件----- > 发件人: 代子为 (Ziwei Dai) > 发送时间: 2023年3月30日 17:27 > 收件人: paulmck@kernel.org; frederic@kernel.org; > quic_neeraju@quicinc.com; josh@joshtriplett.org; rostedt@goodmis.org; > mathieu.desnoyers@efficios.com; jiangshanlai@gmail.com; > joel@joelfernandes.org; rcu@vger.kernel.org > 抄送: linux-kernel@vger.kernel.org; 王双 (Shuang Wang) > ; 辛依凡 (Yifan Xin) ; > 王科 (Ke Wang) ; 闫学文 (Xuewen Yan) > ; 牛志国 (Zhiguo Niu) ; > 代子为 (Ziwei Dai) ; 黄朝阳 (Zhaoyang Huang) > > 主题: [PATCH] rcu: Make sure new krcp free business is handled after the > wanted rcu grace period. > > From: 代子为 (Ziwei Dai) > > In kfree_rcu_monitor(), new free business at krcp is attached to any free > channel at krwp. kfree_rcu_monitor() is responsible to make sure new free > business is handled after the rcu grace period. But if there is any none-free > channel at krwp already, that means there is an on-going rcu work, which will > cause the kvfree_call_rcu()-triggered free business is done before the wanted > rcu grace period ends. > > This commit ignores krwp which has non-free channel at kfree_rcu_monitor(), > to fix the issue that kvfree_call_rcu() loses effectiveness. > > Below is the css_set obj "from_cset" use-after-free issue caused by > kvfree_call_rcu() losing effectiveness. > Core 0 calls rcu_read_lock(), then use "from_cset", then hard irq comes. > Core 1 calls kfree_rcu(cset, rcu_head), willing to free "from_cset" after new > gp. > Core 2 frees "from_cset" after current gp end. "from_cset" is reallocated. > Core 0 references "from_cset"'s member, which causes crash. > > Core 0 Core 1 Core 2 > count_memcg_event_mm() > |rcu_read_lock() <--- > |mem_cgroup_from_task() > |// is the "from_cset" mentioned on core 1 | = > rcu_dereference((task)->cgroups) |// Hard irq comes, current task is > scheduled out. > > Core 1: > cgroup_attach_task() > |cgroup_migrate() > |cgroup_migrate_execute() > |css_set_move_task(task, from_cset, to_cset, true) > |cgroup_move_task(task, to_cset) > |rcu_assign_pointer(.., to_cset) > |... > |cgroup_migrate_finish() > |put_css_set_locked(from_cset) > |from_cset->refcount return 0 > |kfree_rcu(cset, rcu_head) <--- means to free from_cset > after new gp > |add_ptr_to_bulk_krc_lock() > |schedule_delayed_work(&krcp->monitor_work, ..) > > kfree_rcu_monitor() > |krcp->bulk_head[0]'s work attached to > krwp->bulk_head_free[] > |queue_rcu_work(system_wq, &krwp->rcu_work) > |if rwork->rcu.work is not in WORK_STRUCT_PENDING_BIT > state, > |call_rcu(&rwork->rcu, rcu_work_rcufn) <--- request a new gp > > // There is a perious call_rcu(.., > rcu_work_rcufn) > // gp end, rcu_work_rcufn() is called. > rcu_work_rcufn() > |__queue_work(.., rwork->wq, > &rwork->work); > Core 2: > // or there is a pending > kfree_rcu_work() work called. > |kfree_rcu_work() > |krwp->bulk_head_free[0] bulk is > freed before new gp end!!! > |The "from_cset" mentioned on core > 1 is freed before new gp end. > Core 0: > // the task is schedule in after many ms. > |->subsys[(subsys_id) <--- caused kernel crash, because > ="from_cset" is freed. > > Signed-off-by: Ziwei Dai > > :# modified: tree.c > --- > kernel/rcu/tree.c | 19 ++++++++++--------- > 1 file changed, 10 insertions(+), 9 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 8e880c0..f6451a8 > 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3107,15 +3107,16 @@ static void kfree_rcu_monitor(struct > work_struct *work) > for (i = 0; i < KFREE_N_BATCHES; i++) { > struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]); > > - // Try to detach bulk_head or head and attach it over any > - // available corresponding free channel. It can be that > - // a previous RCU batch is in progress, it means that > - // immediately to queue another one is not possible so > - // in that case the monitor work is rearmed. > - if ((!list_empty(&krcp->bulk_head[0]) && > list_empty(&krwp->bulk_head_free[0])) || > - (!list_empty(&krcp->bulk_head[1]) && > list_empty(&krwp->bulk_head_free[1])) || > - (READ_ONCE(krcp->head) && !krwp->head_free)) { > - > + // Try to detach bulk_head or head and attach it, only when > + // all channels are free. Any channel is not free means at krwp > + // there is on-going rcu work to handle krwp's free business. > + if (!list_empty(&krwp->bulk_head_free[0]) || > + !list_empty(&krwp->bulk_head_free[1]) || > + krwp->head_free) > + continue; > + if (!list_empty(&krcp->bulk_head[0]) || > + !list_empty(&krcp->bulk_head[1]) || > + READ_ONCE(krcp->head)) { > // Channel 1 corresponds to the SLAB-pointer bulk path. > // Channel 2 corresponds to vmalloc-pointer bulk path. > for (j = 0; j < FREE_N_CHANNELS; j++) { > -- > 1.9.1 diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 66951e130c2fc..44759641f7234 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3342,15 +3342,21 @@ static void kfree_rcu_monitor(struct work_struct *work) // Attempt to start a new batch. for (i = 0; i < KFREE_N_BATCHES; i++) { struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]); + bool rcu_work_pending; // Try to detach bkvhead or head and attach it over any // available corresponding free channel. It can be that // a previous RCU batch is in progress, it means that // immediately to queue another one is not possible so // in that case the monitor work is rearmed. - if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) || - (krcp->bkvhead[1] && !krwp->bkvhead_free[1]) || - (krcp->head && !krwp->head_free)) { + rcu_work_pending = test_bit( + WORK_STRUCT_PENDING_BIT, + work_data_bits(&krwp->rcu_work.work)); + // If there is on-going rcu work, continue. + if (rcu_work_pending || krwp->bkvhead_free[0] || + krwp->bkvhead_free[1] || krwp->head_free) + continue; + if (krcp->bkvhead[0] || krcp->bkvhead[1] || krcp->head) { // Channel 1 corresponds to the SLAB-pointer bulk path. // Channel 2 corresponds to vmalloc-pointer bulk path. for (j = 0; j < FREE_N_CHANNELS; j++) { As " rcu_work_pending" judgement seems redundant, I made the second patch below on k5.15. We will make stress test. ============================================================ Below is the first debug patch on k5.15 device, which is under stress test, issue not reproduce so far. diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 66951e130c2fc..f219c60a8ec30 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3343,14 +3343,13 @@ static void kfree_rcu_monitor(struct work_struct *work) for (i = 0; i < KFREE_N_BATCHES; i++) { struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]); - // Try to detach bkvhead or head and attach it over any - // available corresponding free channel. It can be that - // a previous RCU batch is in progress, it means that - // immediately to queue another one is not possible so - // in that case the monitor work is rearmed. - if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) || - (krcp->bkvhead[1] && !krwp->bkvhead_free[1]) || - (krcp->head && !krwp->head_free)) { + // Try to detach bulk_head or head and attach it, only when + // all channels are free. Any channel is not free means at krwp + // there is on-going rcu work to handle krwp's free business. + if (krwp->bkvhead_free[0] || krwp->bkvhead_free[1] || + krwp->head_free) + continue; + if (krcp->bkvhead[0] || krcp->bkvhead[1] || krcp->head) { // Channel 1 corresponds to the SLAB-pointer bulk path. // Channel 2 corresponds to vmalloc-pointer bulk path. for (j = 0; j < FREE_N_CHANNELS; j++) {