[8/8] rcu/exp: Remove rcu_par_gp_wq

TREE04 running on short iterations can produce writer stalls of the
following kind:

 ??? Writer stall state RTWS_EXP_SYNC(4) g3968 f0x0 ->state 0x2 cpu 0
 task:rcu_torture_wri state:D stack:14568 pid:83    ppid:2      flags:0x00004000
 Call Trace:
  <TASK>
  __schedule+0x2de/0x850
  ? trace_event_raw_event_rcu_exp_funnel_lock+0x6d/0xb0
  schedule+0x4f/0x90
  synchronize_rcu_expedited+0x430/0x670
  ? __pfx_autoremove_wake_function+0x10/0x10
  ? __pfx_synchronize_rcu_expedited+0x10/0x10
  do_rtws_sync.constprop.0+0xde/0x230
  rcu_torture_writer+0x4b4/0xcd0
  ? __pfx_rcu_torture_writer+0x10/0x10
  kthread+0xc7/0xf0
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x2f/0x50
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1b/0x30
  </TASK>

Waiting for an expedited grace period and polling for an expedited
grace period both are operations that internally rely on the same
workqueue performing necessary asynchronous work.

However, a dependency chain is involved between those two operations,
as depicted below:

       ====== CPU 0 =======                          ====== CPU 1 =======

                                                     synchronize_rcu_expedited()
                                                         exp_funnel_lock()
                                                             mutex_lock(&rcu_state.exp_mutex);
    start_poll_synchronize_rcu_expedited
        queue_work(rcu_gp_wq, &rnp->exp_poll_wq);
                                                         synchronize_rcu_expedited_queue_work()
                                                             queue_work(rcu_gp_wq, &rew->rew_work);
                                                         wait_event() // A, wait for &rew->rew_work completion
                                                         mutex_unlock() // B
    //======> switch to kworker

    sync_rcu_do_polled_gp() {
        synchronize_rcu_expedited()
            exp_funnel_lock()
                mutex_lock(&rcu_state.exp_mutex); // C, wait B
                ....
    } // D

Since workqueues are usually implemented on top of several kworkers
handling the queue concurrently, the above situation wouldn't deadlock
most of the time because A then doesn't depend on D. But in case of
memory stress, a single kworker may end up handling alone all the works
in a serialized way. In that case the above layout becomes a problem
because A then waits for D, closing a circular dependency:

	A -> D -> C -> B -> A

This however only happens when CONFIG_RCU_EXP_KTHREAD=n. Indeed
synchronize_rcu_expedited() is otherwise implemented on top of a kthread
worker while polling still relies on rcu_gp_wq workqueue, breaking the
above circular dependency chain.

Fix this with making expedited grace period to always rely on kthread
worker. The workqueue based implementation is essentially a duplicate
anyway now that the per-node initialization is performed by per-node
kthread workers.

Meanwhile the CONFIG_RCU_EXP_KTHREAD switch is still kept around to
manage the scheduler policy of these kthread workers.

Reported-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Joel Fernandes <joel@joelfernandes.org>
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Suggested-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/rcu.h      |  4 ---
 kernel/rcu/tree.c     | 40 ++++---------------------
 kernel/rcu/tree.h     |  6 +---
 kernel/rcu/tree_exp.h | 70 +------------------------------------------
 4 files changed, 8 insertions(+), 112 deletions(-)

Message ID	20231219140843.939329-9-frederic@kernel.org (mailing list archive)
State	New, archived
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1061132C9A; Tue, 19 Dec 2023 14:09:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YrMddE/Y" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C73E8C433C9; Tue, 19 Dec 2023 14:09:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1702994953; bh=KHpyaaxmo3PSq0zjNwS6oE4RcAJ7YOJQkzkl6fO+m3A=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YrMddE/YAi6sDMcpG5VCuPwcqrPqBKTxGOsGPPZpvig5V0sOge70yxoBTeaJrLA/y XTyhnvfx/j1Se0BXk9XypFOSf5t3amDX022BMXVeAM+SoAFYCQp9RMWdb9RZusS7LN yAY+uGZQhO72SOWPN/R0VQWA3xO+gUReHoPrhWSVDUBqMundLUusvjKqYw3Jqxk5f/ Q27DOg6IasXVIPZ8vMH2GkCCrPGWWMBCJYHZzYWPbnG6jeFRYF4huoriPy9ESnzkDg 2V6TdR9P8w4LxIPDdeSkrbVlIIvWke682qcXM1QkXx6HJz0gajub6qJ9xDU8+b8rUp zBhu1ECxUwPOQ== From: Frederic Weisbecker <frederic@kernel.org> To: LKML <linux-kernel@vger.kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org>, Boqun Feng <boqun.feng@gmail.com>, Joel Fernandes <joel@joelfernandes.org>, Neeraj Upadhyay <neeraj.upadhyay@amd.com>, "Paul E . McKenney" <paulmck@kernel.org>, Uladzislau Rezki <urezki@gmail.com>, Zqiang <qiang.zhang1211@gmail.com>, rcu <rcu@vger.kernel.org>, Hillf Danton <hdanton@sina.com>, Anna-Maria Behnsen <anna-maria@linutronix.de>, Thomas Gleixner <tglx@linutronix.de>, Neeraj upadhyay <Neeraj.Upadhyay@amd.com> Subject: [PATCH 8/8] rcu/exp: Remove rcu_par_gp_wq Date: Tue, 19 Dec 2023 15:08:43 +0100 Message-Id: <20231219140843.939329-9-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231219140843.939329-1-frederic@kernel.org> References: <20231219140843.939329-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: <rcu.vger.kernel.org> List-Subscribe: <mailto:rcu+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:rcu+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	rcu: Fix expedited GP deadlock (and cleanup some nocb stuff) \| expand [0/8,v2] rcu: Fix expedited GP deadlock (and cleanup some nocb stuff) [1/8] rcu/nocb: Make IRQs disablement symmetric [2/8] rcu/nocb: Re-arrange call_rcu() NOCB specific code [3/8] rcu/exp: Fix RCU expedited parallel grace period kworker allocation failure recovery [4/8] rcu/exp: Handle RCU expedited grace period kworker allocation failure [5/8] rcu: s/boost_kthread_mutex/kthread_mutex [6/8] rcu/exp: Make parallel exp gp kworker per rcu node [7/8] rcu/exp: Handle parallel exp gp kworkers affinity [8/8] rcu/exp: Remove rcu_par_gp_wq

[8/8] rcu/exp: Remove rcu_par_gp_wq

Commit Message

Patch