From patchwork Mon Jun 13 14:07:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Sewior X-Patchwork-Id: 12879873 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B301FCCA47B for ; Mon, 13 Jun 2022 18:12:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244772AbiFMSMp (ORCPT ); Mon, 13 Jun 2022 14:12:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53830 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231896AbiFMSMd (ORCPT ); Mon, 13 Jun 2022 14:12:33 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1AD8939D9 for ; Mon, 13 Jun 2022 07:07:18 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1655129236; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KXvpzTKK6kImxAfJ/1ZX0l3M0MzAghGg+baWoltyoCM=; b=EM1a8mm0p3Eq81bf3kzqOSVkHFu+7eYdWYpx9HxizfhBpGQ4X6GxWAxsFbVNAe9e6svl/N 2OQj0VQ80Mu/jEg7aNkpPhEYZddg0hIeaCtUFsAcvHYETJZDdi85Q4YGDim3bgHQlX2QNr PMNpMtMFeEJKvmJ7Ky6m1KE5lFM6K68SwvrMN2kMYS2O5nyw9bNj4Rd/21thB5XbUKfROH x2M17Z8sDcl7aDmcZdj+MtEz/GPjf/fl18KwpFVjlkol8HhmVz5iVMo2FUmxrbni+9npX7 SXVytfYUnlzdAMIEDrWc1EQeo9eq3e7drE1Yj1tTt1YediPR5ZpkxWstvL9+oQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1655129236; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KXvpzTKK6kImxAfJ/1ZX0l3M0MzAghGg+baWoltyoCM=; b=KsQ4OYdxTtWV8iYHDUkuB6YQYLaDZkXY4mCoeQEFm1yrfalbrZDNkdCSznnhPaFegBoQYC PZRVb3dpOLu9hMAg== To: linux-fsdevel@vger.kernel.org Cc: Alexander Viro , Matthew Wilcox , Thomas Gleixner , Sebastian Andrzej Siewior , Oleg.Karfich@wago.com Subject: [PATCH 1/4] fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT Date: Mon, 13 Jun 2022 16:07:09 +0200 Message-Id: <20220613140712.77932-2-bigeasy@linutronix.de> In-Reply-To: <20220613140712.77932-1-bigeasy@linutronix.de> References: <20220613140712.77932-1-bigeasy@linutronix.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org i_dir_seq is a sequence counter with a lock which is represented by the lowest bit. The writer atomically updates the counter which ensures that it can be modified by only one writer at a time. This requires preemption to be disabled across the write side critical section. On !PREEMPT_RT kernels this is implicit by the caller acquiring dentry::lock. On PREEMPT_RT kernels spin_lock() does not disable preemption which means that a preempting writer or reader would live lock. It's therefore required to disable preemption explicitly. An alternative solution would be to replace i_dir_seq with a seqlock_t for PREEMPT_RT, but that comes with its own set of problems due to arbitrary lock nesting. A pure sequence count with an associated spinlock is not possible because the locks held by the caller are not necessarily related. As the critical section is small, disabling preemption is a sensible solution. Reported-by: Oleg.Karfich@wago.com Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior --- fs/dcache.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/dcache.c b/fs/dcache.c index 93f4f5ee07bfd..92aa72fce5e2e 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2563,7 +2563,15 @@ EXPORT_SYMBOL(d_rehash); static inline unsigned start_dir_add(struct inode *dir) { - + /* + * The caller holds a spinlock (dentry::d_lock). On !PREEMPT_RT + * kernels spin_lock() implicitly disables preemption, but not on + * PREEMPT_RT. So for RT it has to be done explicitly to protect + * the sequence count write side critical section against a reader + * or another writer preempting, which would result in a live lock. + */ + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_disable(); for (;;) { unsigned n = dir->i_dir_seq; if (!(n & 1) && cmpxchg(&dir->i_dir_seq, n, n + 1) == n) @@ -2575,6 +2583,8 @@ static inline unsigned start_dir_add(struct inode *dir) static inline void end_dir_add(struct inode *dir, unsigned n) { smp_store_release(&dir->i_dir_seq, n + 2); + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_enable(); } static void d_wait_lookup(struct dentry *dentry) From patchwork Mon Jun 13 14:07:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Sewior X-Patchwork-Id: 12879870 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22D7BC433EF for ; Mon, 13 Jun 2022 18:12:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243347AbiFMSMm (ORCPT ); Mon, 13 Jun 2022 14:12:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52012 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244988AbiFMSMe (ORCPT ); Mon, 13 Jun 2022 14:12:34 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B8E9939E0 for ; Mon, 13 Jun 2022 07:07:19 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1655129237; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=87EpkWmEwWgOBFozevPLbLZSYBNCHi4bNcpHTmRBRbk=; b=wD/5DhCWrF+p6Sk/uKX/ZEEQ1dqfOjkpV3UPf2L728JwuQMPZUUMjKl+F9OvG+j9AXicfj qHCBv2FqBOCFbk2LfvLJp1u2KwMMB6mZkES4Ym/VM+1LKdcE4KRbaCcfFLIoldbvDJY8sK ytASSt8TWVYeyRwRcVHlF1eecX8XdjPTh5coEIdmSjuB/79rMd1GcpefKK7K0CBD55n+kN 03lbpz5tTYbzx7Wbgvii9cPf9YkGh2z0z1GLjeA+rmKB6MlZ0kr6gZGc1cA8KOmXazI3LO 0o5Pu2MP/wyXjB0E/JBtT6lrzVSeRJtLEN5Jc791VgmldZamqpP8jXUxq+6bNA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1655129237; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=87EpkWmEwWgOBFozevPLbLZSYBNCHi4bNcpHTmRBRbk=; b=b33OmkOqNwmF4Yblp1XTB+nDyB1Dx8aXGPiJ/jyhwmS7ArX3AbOL7FigCEiyMVzl9LCyJi TEd/HtuX0yhAuFBA== To: linux-fsdevel@vger.kernel.org Cc: Alexander Viro , Matthew Wilcox , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH 2/4] fs/dcache: Split __d_lookup_done() Date: Mon, 13 Jun 2022 16:07:10 +0200 Message-Id: <20220613140712.77932-3-bigeasy@linutronix.de> In-Reply-To: <20220613140712.77932-1-bigeasy@linutronix.de> References: <20220613140712.77932-1-bigeasy@linutronix.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org __d_lookup_done() wakes waiters on dentry::d_wait inside a preemption disabled region. This violates the PREEMPT_RT constraints as the wake up acquires wait_queue_head::lock which is a "sleeping" spinlock on RT. As a first step to solve this, move the wake up outside of the hlist_bl_lock() held section. This is safe because: 1) The whole sequence including the wake up is protected by dentry::lock. 2) The waitqueue head is allocated by the caller on stack and can't go away until the whole callchain completes. 3) If a queued waiter is woken by a spurious wake up, then it is blocked on dentry:lock before it can observe DCACHE_PAR_LOOKUP cleared and return from d_wait_lookup(). As the wake up is inside the dentry:lock held region it's guaranteed that the waiters waitq is dequeued from the waitqueue head before the waiter returns. Moving the wake up past the unlock of dentry::lock would allow the waiter to return with the on stack waitq still enqueued due to a spurious wake up. 4) New waiters have to acquire dentry::lock before checking whether the DCACHE_PAR_LOOKUP flag is set. Let __d_lookup_unhash(): 1) Lock the lookup hash and clear DCACHE_PAR_LOOKUP 2) Unhash the dentry 3) Retrieve and clear dentry::d_wait 4) Unlock the hash and return the retrieved waitqueue head pointer 5) Let the caller handle the wake up. This does not yet solve the PREEMPT_RT problem completely because preemption is still disabled due to i_dir_seq being held for write. This will be addressed in subsequent steps. An alternative solution would be to switch the waitqueue to a simple waitqueue, but aside of Linus not being a fan of them, moving the wake up closer to the place where dentry::lock is unlocked reduces lock contention time for the woken up waiter. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior --- fs/dcache.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 92aa72fce5e2e..fae4689a9a409 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2711,18 +2711,33 @@ struct dentry *d_alloc_parallel(struct dentry *parent, } EXPORT_SYMBOL(d_alloc_parallel); -void __d_lookup_done(struct dentry *dentry) +/* + * - Unhash the dentry + * - Retrieve and clear the waitqueue head in dentry + * - Return the waitqueue head + */ +static wait_queue_head_t *__d_lookup_unhash(struct dentry *dentry) { - struct hlist_bl_head *b = in_lookup_hash(dentry->d_parent, - dentry->d_name.hash); + wait_queue_head_t *d_wait; + struct hlist_bl_head *b; + + lockdep_assert_held(&dentry->d_lock); + + b = in_lookup_hash(dentry->d_parent, dentry->d_name.hash); hlist_bl_lock(b); dentry->d_flags &= ~DCACHE_PAR_LOOKUP; __hlist_bl_del(&dentry->d_u.d_in_lookup_hash); - wake_up_all(dentry->d_wait); + d_wait = dentry->d_wait; dentry->d_wait = NULL; hlist_bl_unlock(b); INIT_HLIST_NODE(&dentry->d_u.d_alias); INIT_LIST_HEAD(&dentry->d_lru); + return d_wait; +} + +void __d_lookup_done(struct dentry *dentry) +{ + wake_up_all(__d_lookup_unhash(dentry)); } EXPORT_SYMBOL(__d_lookup_done); From patchwork Mon Jun 13 14:07:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Sewior X-Patchwork-Id: 12879872 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64AECC433EF for ; Mon, 13 Jun 2022 18:12:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244204AbiFMSMo (ORCPT ); Mon, 13 Jun 2022 14:12:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55008 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244987AbiFMSMe (ORCPT ); Mon, 13 Jun 2022 14:12:34 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5AD8C939DC for ; Mon, 13 Jun 2022 07:07:19 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1655129237; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8xXJNt0pcED+Mh3lw0k/HQozt40Il0fk+LG9zMZD91I=; b=Hpj9nj5Hq0D+QXD0ult/2SCBkQ1Ye6NKl6+s6SCk0iG/Ey+OGkhDaAsNoY1LyGlUY0cleE p3zJmiwwilCU+bnKefc5LBQ2njcVmLupYYxhCHFK5JUiIMi68Q3YW4OO4O8EYlkIiQn0UY 8JBMf71wd/Ww39Zqv/VumvrYlnhgQedkMGvAx8NumGeqEviYQFcT5CW0ZMmoqGka/YMFun KEji0sKTz/Da1i5WpOiqYVTDNeuSwYrOmJi/33H98hyCklW87ECYTKaVZx418KlQTJ479n yZqkQbam8D7FMG3f2BuQSuoqQPTTTDxbi8aiqIyUzi02mPq2KLpM/Wfr18w9CQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1655129237; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8xXJNt0pcED+Mh3lw0k/HQozt40Il0fk+LG9zMZD91I=; b=QmQ9YkpItYU6kRr5zUNcKU8kzM20n7LEz6hnx39TOboWZz5dGvhH1OxTjVsxPsOy/xVBfX gpKAPuCW9fHv6eAQ== To: linux-fsdevel@vger.kernel.org Cc: Alexander Viro , Matthew Wilcox , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH 3/4] fs/dcache: Use __d_lookup_unhash() in __d_add/move() Date: Mon, 13 Jun 2022 16:07:11 +0200 Message-Id: <20220613140712.77932-4-bigeasy@linutronix.de> In-Reply-To: <20220613140712.77932-1-bigeasy@linutronix.de> References: <20220613140712.77932-1-bigeasy@linutronix.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org __d_add() and __d_move() invoke __d_lookup_done() from within a preemption disabled region. This violates the PREEMPT_RT constraints as the wake up acquires wait_queue_head::lock which is a "sleeping" spinlock on RT. As a preparation for solving this completely, invoke __d_lookup_unhash() from __d_add/move() and handle the wakeup there. This allows to move the spin_lock/unlock(dentry::lock) pair into __d_lookup_done() which debloats the d_lookup_done() inline. No functional change. Moving the wake up out of the preemption disabled region on RT will be handled in a subsequent change. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior --- fs/dcache.c | 6 ++++-- include/linux/dcache.h | 7 ++----- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index fae4689a9a409..6ef1f5c32bc0f 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2737,7 +2737,9 @@ static wait_queue_head_t *__d_lookup_unhash(struct dentry *dentry) void __d_lookup_done(struct dentry *dentry) { + spin_lock(&dentry->d_lock); wake_up_all(__d_lookup_unhash(dentry)); + spin_unlock(&dentry->d_lock); } EXPORT_SYMBOL(__d_lookup_done); @@ -2751,7 +2753,7 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode) if (unlikely(d_in_lookup(dentry))) { dir = dentry->d_parent->d_inode; n = start_dir_add(dir); - __d_lookup_done(dentry); + wake_up_all(__d_lookup_unhash(dentry)); } if (inode) { unsigned add_flags = d_flags_for_inode(inode); @@ -2940,7 +2942,7 @@ static void __d_move(struct dentry *dentry, struct dentry *target, if (unlikely(d_in_lookup(target))) { dir = target->d_parent->d_inode; n = start_dir_add(dir); - __d_lookup_done(target); + wake_up_all(__d_lookup_unhash(target)); } write_seqcount_begin(&dentry->d_seq); diff --git a/include/linux/dcache.h b/include/linux/dcache.h index f5bba51480b2f..a07a51c858fb4 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -349,7 +349,7 @@ static inline void dont_mount(struct dentry *dentry) spin_unlock(&dentry->d_lock); } -extern void __d_lookup_done(struct dentry *); +extern void __d_lookup_done(struct dentry *dentry); static inline int d_in_lookup(const struct dentry *dentry) { @@ -358,11 +358,8 @@ static inline int d_in_lookup(const struct dentry *dentry) static inline void d_lookup_done(struct dentry *dentry) { - if (unlikely(d_in_lookup(dentry))) { - spin_lock(&dentry->d_lock); + if (unlikely(d_in_lookup(dentry))) __d_lookup_done(dentry); - spin_unlock(&dentry->d_lock); - } } extern void dput(struct dentry *); From patchwork Mon Jun 13 14:07:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Sewior X-Patchwork-Id: 12879871 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 240F9C43334 for ; Mon, 13 Jun 2022 18:12:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244192AbiFMSMo (ORCPT ); Mon, 13 Jun 2022 14:12:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243526AbiFMSMe (ORCPT ); Mon, 13 Jun 2022 14:12:34 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B55D939DF for ; Mon, 13 Jun 2022 07:07:19 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1655129237; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IG13RLOoPvGR+qM//UejEuxu7S13GlSV9DmStF4ErgY=; b=UkJX3pnGnoeWOJoCbDihwzhMUKYhZlkr/xk/Vi5+1ln3jpOHnANSxlOSORGh6mB6GU4PNl aO7ZJOKpQVRMDfpIv/LYe1S/Ks5UbLW4QzY+OWBk5BHW0G8DU2nkW8ccA5dcHTKNTCev9a WHZGFfx8lBDZDNz3pmP4hJyaIxHzffz7bCa48dZdVzSw9j9zFFEgqj0u9zR1/qV8fidbkX JI5ohxeuweE5hzxKUYaQFvFKGWO7YYObeDrEfciueSL51GgGUsFrBstBLxy+dCpNXlcXOb CYW6nugajSn5weqJ5h03i8aRjKzFUyeGyopziT5YEAdhRND5LBadDhkkRCfhBA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1655129237; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IG13RLOoPvGR+qM//UejEuxu7S13GlSV9DmStF4ErgY=; b=UhOD4ei8tiD3Rfx8VfwZTTIT8bR3KwBuCdSt5Z7q2fc0+CyK0p5eGwAKboxzjj8ByQUCwz WxMv64RDEiiuN6Dw== To: linux-fsdevel@vger.kernel.org Cc: Alexander Viro , Matthew Wilcox , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH 4/4] fs/dcache: Move wakeup out of i_seq_dir write held region Date: Mon, 13 Jun 2022 16:07:12 +0200 Message-Id: <20220613140712.77932-5-bigeasy@linutronix.de> In-Reply-To: <20220613140712.77932-1-bigeasy@linutronix.de> References: <20220613140712.77932-1-bigeasy@linutronix.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org __d_add() and __d_move() wake up waiters on dentry::d_wait from within the i_seq_dir write held region. This violates the PREEMPT_RT constraints as the wake up acquires wait_queue_head::lock which is a "sleeping" spinlock on RT. There is no requirement to do so. __d_lookup_unhash() has cleared DCACHE_PAR_LOOKUP and dentry::d_wait and returned the now unreachable wait queue head pointer to the caller, so the actual wake up can be postponed until the i_dir_seq write side critical section is left. The only requirement is that dentry::lock is held across the whole sequence including the wake up. This is safe because: 1) The whole sequence including the wake up is protected by dentry::lock. 2) The waitqueue head is allocated by the caller on stack and can't go away until the whole callchain completes. 3) If a queued waiter is woken by a spurious wake up, then it is blocked on dentry:lock before it can observe DCACHE_PAR_LOOKUP cleared and return from d_wait_lookup(). As the wake up is inside the dentry:lock held region it's guaranteed that the waiters waitq is dequeued from the waitqueue head before the waiter returns. Moving the wake up past the unlock of dentry::lock would allow the waiter to return with the on stack waitq still enqueued due to a spurious wake up. 4) New waiters have to acquire dentry::lock before checking whether the DCACHE_PAR_LOOKUP flag is set. Move the wake up past end_dir_add() which leaves the i_dir_seq write side critical section and enables preemption. For non RT kernels there is no difference because preemption is still disabled due to dentry::lock being held, but it shortens the time between wake up and unlocking dentry::lock, which reduces the contention for the woken up waiter. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior --- fs/dcache.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 6ef1f5c32bc0f..0b5fd3a17ff7c 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2747,13 +2747,15 @@ EXPORT_SYMBOL(__d_lookup_done); static inline void __d_add(struct dentry *dentry, struct inode *inode) { + wait_queue_head_t *d_wait; struct inode *dir = NULL; unsigned n; + spin_lock(&dentry->d_lock); if (unlikely(d_in_lookup(dentry))) { dir = dentry->d_parent->d_inode; n = start_dir_add(dir); - wake_up_all(__d_lookup_unhash(dentry)); + d_wait = __d_lookup_unhash(dentry); } if (inode) { unsigned add_flags = d_flags_for_inode(inode); @@ -2764,8 +2766,10 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode) fsnotify_update_flags(dentry); } __d_rehash(dentry); - if (dir) + if (dir) { end_dir_add(dir, n); + wake_up_all(d_wait); + } spin_unlock(&dentry->d_lock); if (inode) spin_unlock(&inode->i_lock); @@ -2912,6 +2916,7 @@ static void __d_move(struct dentry *dentry, struct dentry *target, bool exchange) { struct dentry *old_parent, *p; + wait_queue_head_t *d_wait; struct inode *dir = NULL; unsigned n; @@ -2942,7 +2947,7 @@ static void __d_move(struct dentry *dentry, struct dentry *target, if (unlikely(d_in_lookup(target))) { dir = target->d_parent->d_inode; n = start_dir_add(dir); - wake_up_all(__d_lookup_unhash(target)); + d_wait = __d_lookup_unhash(target); } write_seqcount_begin(&dentry->d_seq); @@ -2977,8 +2982,10 @@ static void __d_move(struct dentry *dentry, struct dentry *target, write_seqcount_end(&target->d_seq); write_seqcount_end(&dentry->d_seq); - if (dir) + if (dir) { end_dir_add(dir, n); + wake_up_all(d_wait); + } if (dentry->d_parent != old_parent) spin_unlock(&dentry->d_parent->d_lock);