From patchwork Wed Jul 27 11:49:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Sewior X-Patchwork-Id: 12930387 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B968C04A68 for ; Wed, 27 Jul 2022 11:49:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232481AbiG0LtU (ORCPT ); Wed, 27 Jul 2022 07:49:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37214 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232499AbiG0LtQ (ORCPT ); Wed, 27 Jul 2022 07:49:16 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67BE74AD78 for ; Wed, 27 Jul 2022 04:49:13 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1658922551; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l2X2DqDoeVLNPuGgYJPU21ZoOObmpTzKokmqtTV7SfU=; b=kX5EgXzxRRECz7jKIFEP/hsLUb0CV0p141JOd/QfZwhqNUKIooUzw3X7XoZ0aMq8lFXzui U5zCOTb5tIPlb3z/SfAR6x83w9vpwiWWKw4ZlsVzJHlD7rEVra0tE5GkNBKzYdCc5rLUZn /uneJWJUHA9DZu/KqcBxq/MVViXbmgXHoem7eQNHcIRljoQpYP2flQ2TsAVyMrh1dqTirQ dhUZt9nvRSZcZAaqVxnbuy5OQB8jGEpky6++Tv8wpyI+ux1e9zeRAE35yYb4YQh0gCLfqQ B8T2mGZVFzaDQCnM7PEfg85watjCRGJFNaPmSFZdNeOEAY+hEXqeKyfmJqWMqQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1658922551; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l2X2DqDoeVLNPuGgYJPU21ZoOObmpTzKokmqtTV7SfU=; b=tDa/NobMm33dK6f5onLnvWViwfdJPf6DfPZXd/n6UaF58XGLzaW6CxWz2/Qi5cFD7jXPBo 4H3tJavv/dv2MWDA== To: linux-fsdevel@vger.kernel.org Cc: Alexander Viro , Matthew Wilcox , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH 1/4 v2] fs/dcache: d_add_ci() needs to complete parallel lookup. Date: Wed, 27 Jul 2022 13:49:01 +0200 Message-Id: <20220727114904.130761-2-bigeasy@linutronix.de> In-Reply-To: <20220727114904.130761-1-bigeasy@linutronix.de> References: <20220727114904.130761-1-bigeasy@linutronix.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Al Viro Result of d_alloc_parallel() in d_add_ci() is fed to d_splice_alias(), which *NORMALLY* feeds it to __d_add() or __d_move() in a way that will have __d_lookup_done() applied to it. However, there is a nasty possibility - d_splice_alias() might legitimately fail without having marked the sucker not in-lookup. dentry will get dropped by d_add_ci(), so ->d_wait won't end up pointing to freed object, but it's still a bug - retain_dentry() will scream bloody murder upon seeing that, and for a good reason; we'll get hash chain corrupted. It's impossible to hit without corrupted fs image (ntfs or case-insensitive xfs), but it's a bug. Invoke d_lookup_done() after d_splice_alias() to ensure that the in-lookip flag is always cleared. Fixes: d9171b9345261 ("parallel lookups machinery, part 4 (and last)") Signed-off-by: Al Viro Signed-off-by: Sebastian Andrzej Siewior --- fs/dcache.c | 1 + 1 file changed, 1 insertion(+) --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2239,6 +2239,7 @@ struct dentry *d_add_ci(struct dentry *d } } res = d_splice_alias(inode, found); + d_lookup_done(found); if (res) { dput(found); return res; From patchwork Wed Jul 27 11:49:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Sewior X-Patchwork-Id: 12930385 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 017D6C04A68 for ; Wed, 27 Jul 2022 11:49:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231226AbiG0LtS (ORCPT ); Wed, 27 Jul 2022 07:49:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232482AbiG0LtP (ORCPT ); Wed, 27 Jul 2022 07:49:15 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 683894AD7B for ; Wed, 27 Jul 2022 04:49:13 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1658922552; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8H4RHHLP/IC4duyp+ci7tktfckRg+ea6zo6aVOrzr/4=; b=K5b7X5hiFu6JtSxqKWdUXdwgHOKSw2ACh0cXDIlPx2iNlWJF1YOZJ2TEHO1aybrZ5Zx92o XjeSWTd8Plq8u9jBnCCjYUmlasUl5/BsK7pGcgj1SEy25wtPz9euE0xTrcQdUqpJUxp5bn kC/K6bqSrCFo5zDvXkcqbarhveQlBj5klJ07YiAqn7fc2yYdK99e7m9gZAqRCbsGewmVxn XLEVCrtKzu2E0foQD3RLfh9h0KP5hTDhVK81X1pmumno9QrLiPoHJZ+geVaHlgAEGwOeNM YYsUePOPA4TyEb5mJw3EWSJuO/qjTPiB53vBwsYh5NkzHBpKJqNr7WK7fuTbsg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1658922552; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8H4RHHLP/IC4duyp+ci7tktfckRg+ea6zo6aVOrzr/4=; b=bIKZ5GdVndf3XmdOYYvPPHAjPeHNYSgOdG7DbKChuYtr9c7k/+vk9ShGQS//3iXuTxh2Sz Ez75s8YZcAVjcrCQ== To: linux-fsdevel@vger.kernel.org Cc: Alexander Viro , Matthew Wilcox , Thomas Gleixner , Sebastian Andrzej Siewior , Oleg.Karfich@wago.com Subject: [PATCH 2/4 v2] fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT Date: Wed, 27 Jul 2022 13:49:02 +0200 Message-Id: <20220727114904.130761-3-bigeasy@linutronix.de> In-Reply-To: <20220727114904.130761-1-bigeasy@linutronix.de> References: <20220727114904.130761-1-bigeasy@linutronix.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org i_dir_seq is a sequence counter with a lock which is represented by the lowest bit. The writer atomically updates the counter which ensures that it can be modified by only one writer at a time. This requires preemption to be disabled across the write side critical section. On !PREEMPT_RT kernels this is implicit by the caller acquiring dentry::lock. On PREEMPT_RT kernels spin_lock() does not disable preemption which means that a preempting writer or reader would live lock. It's therefore required to disable preemption explicitly. An alternative solution would be to replace i_dir_seq with a seqlock_t for PREEMPT_RT, but that comes with its own set of problems due to arbitrary lock nesting. A pure sequence count with an associated spinlock is not possible because the locks held by the caller are not necessarily related. As the critical section is small, disabling preemption is a sensible solution. Reported-by: Oleg.Karfich@wago.com Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior Link: https://lkml.kernel.org/r/20220613140712.77932-2-bigeasy@linutronix.de --- fs/dcache.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2564,7 +2564,15 @@ EXPORT_SYMBOL(d_rehash); static inline unsigned start_dir_add(struct inode *dir) { - + /* + * The caller holds a spinlock (dentry::d_lock). On !PREEMPT_RT + * kernels spin_lock() implicitly disables preemption, but not on + * PREEMPT_RT. So for RT it has to be done explicitly to protect + * the sequence count write side critical section against a reader + * or another writer preempting, which would result in a live lock. + */ + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_disable(); for (;;) { unsigned n = dir->i_dir_seq; if (!(n & 1) && cmpxchg(&dir->i_dir_seq, n, n + 1) == n) @@ -2576,6 +2584,8 @@ static inline unsigned start_dir_add(str static inline void end_dir_add(struct inode *dir, unsigned n) { smp_store_release(&dir->i_dir_seq, n + 2); + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_enable(); } static void d_wait_lookup(struct dentry *dentry) From patchwork Wed Jul 27 11:49:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Sewior X-Patchwork-Id: 12930386 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49C61C19F29 for ; Wed, 27 Jul 2022 11:49:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232457AbiG0LtT (ORCPT ); Wed, 27 Jul 2022 07:49:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232481AbiG0LtP (ORCPT ); Wed, 27 Jul 2022 07:49:15 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D4584AD76 for ; Wed, 27 Jul 2022 04:49:13 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1658922552; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hyjTnpv2RVelFnCEX6pHt5pzPjSU90B+hL5osGhj+Vc=; b=XMq05kgNEkR9LR1BErUzhQ+5t4VmDv6RiS0KbJAYA6IZGOGvnce7uKi39SHtge61693nzk 40bO+P9FbsmpDPXmVnaN+bn7nXlCBreGoZyuMQ7CqsuhQ5utcvWmeO0kx93bCoqD/pjtQD 7Ie//Lb1xZjTugbxahmJx0L2Bhi3ZdFBWXke60S8ok0N+gw7YfkL+ce9w3cT94TF/8XEsM wAcDGISmnK0mxp52XUmaKJpQtHga/i9BawBnQFcFV6dH/PhViKMqYL71noE8Bl5vXwB1a9 AAKT4QBDzhGIoDAh7tP99KvSrpS0kNNzLSnZ8KupSx6heYKIL9yNKUSjpO3yFA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1658922552; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hyjTnpv2RVelFnCEX6pHt5pzPjSU90B+hL5osGhj+Vc=; b=g5WSONi6tQyKSKNDLWLnfX7XY3bAyfKYVWI3Zvc3BhAeIpu//av6TDk8GMZ5mjY+TyhbVU 7VgAfVmrSbKNtGDA== To: linux-fsdevel@vger.kernel.org Cc: Alexander Viro , Matthew Wilcox , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH 3/4 v2] fs/dcache: Move the wakeup from __d_lookup_done() to the caller. Date: Wed, 27 Jul 2022 13:49:03 +0200 Message-Id: <20220727114904.130761-4-bigeasy@linutronix.de> In-Reply-To: <20220727114904.130761-1-bigeasy@linutronix.de> References: <20220727114904.130761-1-bigeasy@linutronix.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org __d_lookup_done() wakes waiters on dentry->d_wait. On PREEMPT_RT we are not allowed to do that with preemption disabled, since the wakeup acquired wait_queue_head::lock, which is a "sleeping" spinlock on RT. Calling it under dentry->d_lock is not a problem, since that is also a "sleeping" spinlock on the same configs. Unfortunately, two of its callers (__d_add() and __d_move()) are holding more than just ->d_lock and that needs to be dealt with. The key observation is that wakeup can be moved to any point before dropping ->d_lock. As a first step to solve this, move the wake up outside of the hlist_bl_lock() held section. This is safe because: Waiters get inserted into ->d_wait only after they'd taken ->d_lock and observed DCACHE_PAR_LOOKUP in flags. As long as they are woken up (and evicted from the queue) between the moment __d_lookup_done() has removed DCACHE_PAR_LOOKUP and dropping ->d_lock, we are safe, since the waitqueue ->d_wait points to won't get destroyed without having __d_lookup_done(dentry) called (under ->d_lock). ->d_wait is set only by d_alloc_parallel() and only in case when it returns a freshly allocated in-lookup dentry. Whenever that happens, we are guaranteed that __d_lookup_done() will be called for resulting dentry (under ->d_lock) before the wq in question gets destroyed. With two exceptions wq lives in call frame of the caller of d_alloc_parallel() and we have an explicit d_lookup_done() on the resulting in-lookup dentry before we leave that frame. One of those exceptions is nfs_call_unlink(), where wq is embedded into (dynamically allocated) struct nfs_unlinkdata. It is destroyed in nfs_async_unlink_release() after an explicit d_lookup_done() on the dentry wq went into. Remaining exception is d_add_ci(). There wq is what we'd found in ->d_wait of d_add_ci() argument. Callers of d_add_ci() are two instances of ->d_lookup() and they must have been given an in-lookup dentry. Which means that they'd been called by __lookup_slow() or lookup_open(), with wq in the call frame of one of those. Result of d_alloc_parallel() in d_add_ci() is fed to d_splice_alias(), which either returns non-NULL (and d_add_ci() does d_lookup_done()) or feeds dentry to __d_add() that will do __d_lookup_done() under ->d_lock. That concludes the analysis. Let __d_lookup_unhash(): 1) Lock the lookup hash and clear DCACHE_PAR_LOOKUP 2) Unhash the dentry 3) Retrieve and clear dentry::d_wait 4) Unlock the hash and return the retrieved waitqueue head pointer 5) Let the caller handle the wake up. 6) Rename __d_lookup_done() to __d_lookup_unhash_wake() to enforce build failures for OOT code that used __d_lookup_done() and is not aware of the new return value. This does not yet solve the PREEMPT_RT problem completely because preemption is still disabled due to i_dir_seq being held for write. This will be addressed in subsequent steps. An alternative solution would be to switch the waitqueue to a simple waitqueue, but aside of Linus not being a fan of them, moving the wake up closer to the place where dentry::lock is unlocked reduces lock contention time for the woken up waiter. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior Link: https://lkml.kernel.org/r/20220613140712.77932-3-bigeasy@linutronix.de --- fs/dcache.c | 35 ++++++++++++++++++++++++++++------- include/linux/dcache.h | 9 +++------ 2 files changed, 31 insertions(+), 13 deletions(-) --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2712,32 +2712,51 @@ struct dentry *d_alloc_parallel(struct d } EXPORT_SYMBOL(d_alloc_parallel); -void __d_lookup_done(struct dentry *dentry) +/* + * - Unhash the dentry + * - Retrieve and clear the waitqueue head in dentry + * - Return the waitqueue head + */ +static wait_queue_head_t *__d_lookup_unhash(struct dentry *dentry) { - struct hlist_bl_head *b = in_lookup_hash(dentry->d_parent, - dentry->d_name.hash); + wait_queue_head_t *d_wait; + struct hlist_bl_head *b; + + lockdep_assert_held(&dentry->d_lock); + + b = in_lookup_hash(dentry->d_parent, dentry->d_name.hash); hlist_bl_lock(b); dentry->d_flags &= ~DCACHE_PAR_LOOKUP; __hlist_bl_del(&dentry->d_u.d_in_lookup_hash); - wake_up_all(dentry->d_wait); + d_wait = dentry->d_wait; dentry->d_wait = NULL; hlist_bl_unlock(b); INIT_HLIST_NODE(&dentry->d_u.d_alias); INIT_LIST_HEAD(&dentry->d_lru); + return d_wait; +} + +void __d_lookup_unhash_wake(struct dentry *dentry) +{ + spin_lock(&dentry->d_lock); + wake_up_all(__d_lookup_unhash(dentry)); + spin_unlock(&dentry->d_lock); } -EXPORT_SYMBOL(__d_lookup_done); +EXPORT_SYMBOL(__d_lookup_unhash_wake); /* inode->i_lock held if inode is non-NULL */ static inline void __d_add(struct dentry *dentry, struct inode *inode) { + wait_queue_head_t *d_wait; struct inode *dir = NULL; unsigned n; spin_lock(&dentry->d_lock); if (unlikely(d_in_lookup(dentry))) { dir = dentry->d_parent->d_inode; n = start_dir_add(dir); - __d_lookup_done(dentry); + d_wait = __d_lookup_unhash(dentry); + wake_up_all(d_wait); } if (inode) { unsigned add_flags = d_flags_for_inode(inode); @@ -2896,6 +2915,7 @@ static void __d_move(struct dentry *dent bool exchange) { struct dentry *old_parent, *p; + wait_queue_head_t *d_wait; struct inode *dir = NULL; unsigned n; @@ -2926,7 +2946,8 @@ static void __d_move(struct dentry *dent if (unlikely(d_in_lookup(target))) { dir = target->d_parent->d_inode; n = start_dir_add(dir); - __d_lookup_done(target); + d_wait = __d_lookup_unhash(target); + wake_up_all(d_wait); } write_seqcount_begin(&dentry->d_seq); --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -349,7 +349,7 @@ static inline void dont_mount(struct den spin_unlock(&dentry->d_lock); } -extern void __d_lookup_done(struct dentry *); +extern void __d_lookup_unhash_wake(struct dentry *dentry); static inline int d_in_lookup(const struct dentry *dentry) { @@ -358,11 +358,8 @@ static inline int d_in_lookup(const stru static inline void d_lookup_done(struct dentry *dentry) { - if (unlikely(d_in_lookup(dentry))) { - spin_lock(&dentry->d_lock); - __d_lookup_done(dentry); - spin_unlock(&dentry->d_lock); - } + if (unlikely(d_in_lookup(dentry))) + __d_lookup_unhash_wake(dentry); } extern void dput(struct dentry *); From patchwork Wed Jul 27 11:49:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Sewior X-Patchwork-Id: 12930388 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7643C19F29 for ; Wed, 27 Jul 2022 11:49:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232504AbiG0LtW (ORCPT ); Wed, 27 Jul 2022 07:49:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232500AbiG0LtQ (ORCPT ); Wed, 27 Jul 2022 07:49:16 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 684EE4AD7C for ; Wed, 27 Jul 2022 04:49:14 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1658922552; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A/FOS4ONZOEScBk2DyiWA/nhAEuypXKhytyCAeqticw=; b=M6WlP+Q5YZKVsIMBtwPfVjIHFqXYdyLlhgC0e0LPgoeHNfRkLJut0CMORJEHMHfgHwvs/U 4d9oeV6naEOBog5kchOkQMA/+AvAUycx5sNQ4m23iUdEYQFk8MuDRhydqP/4gkA91L31t2 L/YpthKfdcCDETtregE9ILJVLyJPKbSmD7fRb92Dqg+sAYrOF9QYV6KaeXz5v4I9VwNZV4 E8DbKshbjy9cJp0azEqb4y1U4nDVumtjm/f0hsU2qPuEh4Rbm7P2QHnWcJO/8NkgFyORdU cSUMggNY5A/Wm/YC3/YCFKqWupfh9DDq/IADchvVTUcRqyJG+F8vPfGK5UXAxA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1658922552; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A/FOS4ONZOEScBk2DyiWA/nhAEuypXKhytyCAeqticw=; b=7vqBqG6gM2WhPx5qo0MR3CQFdFhH5Vwd6raYDwrsdHf90ka+hKlXCR387Gm6hElhcBjQYV W5eslVdcypdEXuAQ== To: linux-fsdevel@vger.kernel.org Cc: Alexander Viro , Matthew Wilcox , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH 4/4 v2] fs/dcache: Move wakeup out of i_seq_dir write held region. Date: Wed, 27 Jul 2022 13:49:04 +0200 Message-Id: <20220727114904.130761-5-bigeasy@linutronix.de> In-Reply-To: <20220727114904.130761-1-bigeasy@linutronix.de> References: <20220727114904.130761-1-bigeasy@linutronix.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org __d_add() and __d_move() wake up waiters on dentry::d_wait from within the i_seq_dir write held region. This violates the PREEMPT_RT constraints as the wake up acquires wait_queue_head::lock which is a "sleeping" spinlock on RT. There is no requirement to do so. __d_lookup_unhash() has cleared DCACHE_PAR_LOOKUP and dentry::d_wait and returned the now unreachable wait queue head pointer to the caller, so the actual wake up can be postponed until the i_dir_seq write side critical section is left. The only requirement is that dentry::lock is held across the whole sequence including the wake up. The previous commit includes an analysis why this is considered safe. Move the wake up past end_dir_add() which leaves the i_dir_seq write side critical section and enables preemption. For non RT kernels there is no difference because preemption is still disabled due to dentry::lock being held, but it shortens the time between wake up and unlocking dentry::lock, which reduces the contention for the woken up waiter. Signed-off-by: Sebastian Andrzej Siewior --- fs/dcache.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2581,11 +2581,13 @@ static inline unsigned start_dir_add(str } } -static inline void end_dir_add(struct inode *dir, unsigned n) +static inline void end_dir_add(struct inode *dir, unsigned int n, + wait_queue_head_t *d_wait) { smp_store_release(&dir->i_dir_seq, n + 2); if (IS_ENABLED(CONFIG_PREEMPT_RT)) preempt_enable(); + wake_up_all(d_wait); } static void d_wait_lookup(struct dentry *dentry) @@ -2756,7 +2758,6 @@ static inline void __d_add(struct dentry dir = dentry->d_parent->d_inode; n = start_dir_add(dir); d_wait = __d_lookup_unhash(dentry); - wake_up_all(d_wait); } if (inode) { unsigned add_flags = d_flags_for_inode(inode); @@ -2768,7 +2769,7 @@ static inline void __d_add(struct dentry } __d_rehash(dentry); if (dir) - end_dir_add(dir, n); + end_dir_add(dir, n, d_wait); spin_unlock(&dentry->d_lock); if (inode) spin_unlock(&inode->i_lock); @@ -2947,7 +2948,6 @@ static void __d_move(struct dentry *dent dir = target->d_parent->d_inode; n = start_dir_add(dir); d_wait = __d_lookup_unhash(target); - wake_up_all(d_wait); } write_seqcount_begin(&dentry->d_seq); @@ -2983,7 +2983,7 @@ static void __d_move(struct dentry *dent write_seqcount_end(&dentry->d_seq); if (dir) - end_dir_add(dir, n); + end_dir_add(dir, n, d_wait); if (dentry->d_parent != old_parent) spin_unlock(&dentry->d_parent->d_lock);