From patchwork Thu Aug 1 01:01:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11069861 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1F0AE13AC for ; Thu, 1 Aug 2019 01:16:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1026E27F98 for ; Thu, 1 Aug 2019 01:16:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0346528173; Thu, 1 Aug 2019 01:16:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE70627F98 for ; Thu, 1 Aug 2019 01:16:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729329AbfHABQd (ORCPT ); Wed, 31 Jul 2019 21:16:33 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:33537 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726448AbfHABQd (ORCPT ); Wed, 31 Jul 2019 21:16:33 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1hszhF-0002I4-U4; Thu, 01 Aug 2019 03:15:58 +0200 Message-Id: <20190801010943.975240672@linutronix.de> User-Agent: quilt/0.65 Date: Thu, 01 Aug 2019 03:01:27 +0200 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Ingo Molnar , Sebastian Siewior , Anna-Maria Gleixner , Steven Rostedt , Julia Cartwright , Jan Kara , "Theodore Tso" , Matthew Wilcox , Alexander Viro , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, Jan Kara , Mark Fasheh , Joseph Qi , Joel Becker Subject: [patch V2 1/7] locking/lockdep: Add Kconfig option for bit spinlocks References: <20190801010126.245731659@linutronix.de> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Some usage sites of bit spinlocks have a substitution with regular spinlocks which depends on CONFIG_PREEMPT_RT. But this substitution can also be used to expose these locks to the regular lock debugging infrastructure, e.g. lockdep. As this increases the size of affected data structures significantly this is guarded by a separate Kconfig switch. Note, that only the bit spinlocks which have a substitution implemented will be covered by this. All other bit spinlocks evade lock debugging as before. Signed-off-by: Thomas Gleixner --- lib/Kconfig.debug | 10 ++++++++++ 1 file changed, 10 insertions(+) --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1201,6 +1201,16 @@ config DEBUG_RWSEMS This debugging feature allows mismatched rw semaphore locks and unlocks to be detected and reported. +config DEBUG_BIT_SPINLOCKS + bool "Bit spinlock debugging" + depends on DEBUG_SPINLOCK + help + This debugging feature substitutes bit spinlocks in some use + cases, e.g. buffer head, zram, with with regular spinlocks so + these locks are exposed to lock debugging features. + + Not all bit spinlocks are covered by this. + config DEBUG_LOCK_ALLOC bool "Lock debugging: detect incorrect freeing of live locks" depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT From patchwork Thu Aug 1 01:01:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11069865 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 249BA13A4 for ; Thu, 1 Aug 2019 01:16:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 177B727F98 for ; Thu, 1 Aug 2019 01:16:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0BD1128173; Thu, 1 Aug 2019 01:16:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6C4FD27F98 for ; Thu, 1 Aug 2019 01:16:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729414AbfHABQj (ORCPT ); Wed, 31 Jul 2019 21:16:39 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:33542 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726448AbfHABQi (ORCPT ); Wed, 31 Jul 2019 21:16:38 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1hszhG-0002I7-RD; Thu, 01 Aug 2019 03:15:58 +0200 Message-Id: <20190801010944.065808812@linutronix.de> User-Agent: quilt/0.65 Date: Thu, 01 Aug 2019 03:01:28 +0200 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Ingo Molnar , Sebastian Siewior , Anna-Maria Gleixner , Steven Rostedt , Julia Cartwright , Jan Kara , "Theodore Tso" , Matthew Wilcox , Alexander Viro , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, Jan Kara , Mark Fasheh , Joseph Qi , Joel Becker Subject: [patch V2 2/7] fs/buffer: Move BH_Uptodate_Lock locking into wrapper functions References: <20190801010126.245731659@linutronix.de> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Bit spinlocks are problematic if PREEMPT_RT is enabled, because they disable preemption, which is undesired for latency reasons and breaks when regular spinlocks are taken within the bit_spinlock locked region because regular spinlocks are converted to 'sleeping spinlocks' on RT. So RT replaces the bit spinlocks with regular spinlocks to avoid this problem. To avoid ifdeffery at the source level, wrap all BH_Uptodate_Lock bitlock operations with inline functions, so the spinlock substitution can be done at one place. Using regular spinlocks can also be enabled for lock debugging purposes so the lock operations become visible to lockdep. Signed-off-by: Thomas Gleixner Reviewed-by: Jan Kara Cc: "Theodore Ts'o" Cc: Matthew Wilcox Cc: Alexander Viro Cc: linux-fsdevel@vger.kernel.org --- V2: Collected Reviewed-by tag --- fs/buffer.c | 20 ++++++-------------- fs/ext4/page-io.c | 6 ++---- fs/ntfs/aops.c | 10 +++------- include/linux/buffer_head.h | 16 ++++++++++++++++ 4 files changed, 27 insertions(+), 25 deletions(-) --- a/fs/buffer.c +++ b/fs/buffer.c @@ -275,8 +275,7 @@ static void end_buffer_async_read(struct * decide that the page is now completely done. */ first = page_buffers(page); - local_irq_save(flags); - bit_spin_lock(BH_Uptodate_Lock, &first->b_state); + flags = bh_uptodate_lock_irqsave(first); clear_buffer_async_read(bh); unlock_buffer(bh); tmp = bh; @@ -289,8 +288,7 @@ static void end_buffer_async_read(struct } tmp = tmp->b_this_page; } while (tmp != bh); - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); + bh_uptodate_unlock_irqrestore(first, flags); /* * If none of the buffers had errors and they are all @@ -302,9 +300,7 @@ static void end_buffer_async_read(struct return; still_busy: - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); - return; + bh_uptodate_unlock_irqrestore(first, flags); } /* @@ -331,8 +327,7 @@ void end_buffer_async_write(struct buffe } first = page_buffers(page); - local_irq_save(flags); - bit_spin_lock(BH_Uptodate_Lock, &first->b_state); + flags = bh_uptodate_lock_irqsave(first); clear_buffer_async_write(bh); unlock_buffer(bh); @@ -344,15 +339,12 @@ void end_buffer_async_write(struct buffe } tmp = tmp->b_this_page; } - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); + bh_uptodate_unlock_irqrestore(first, flags); end_page_writeback(page); return; still_busy: - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); - return; + bh_uptodate_unlock_irqrestore(first, flags); } EXPORT_SYMBOL(end_buffer_async_write); --- a/fs/ext4/page-io.c +++ b/fs/ext4/page-io.c @@ -90,8 +90,7 @@ static void ext4_finish_bio(struct bio * * We check all buffers in the page under BH_Uptodate_Lock * to avoid races with other end io clearing async_write flags */ - local_irq_save(flags); - bit_spin_lock(BH_Uptodate_Lock, &head->b_state); + flags = bh_uptodate_lock_irqsave(head); do { if (bh_offset(bh) < bio_start || bh_offset(bh) + bh->b_size > bio_end) { @@ -103,8 +102,7 @@ static void ext4_finish_bio(struct bio * if (bio->bi_status) buffer_io_error(bh); } while ((bh = bh->b_this_page) != head); - bit_spin_unlock(BH_Uptodate_Lock, &head->b_state); - local_irq_restore(flags); + bh_uptodate_unlock_irqrestore(head, flags); if (!under_io) { fscrypt_free_bounce_page(bounce_page); end_page_writeback(page); --- a/fs/ntfs/aops.c +++ b/fs/ntfs/aops.c @@ -92,8 +92,7 @@ static void ntfs_end_buffer_async_read(s "0x%llx.", (unsigned long long)bh->b_blocknr); } first = page_buffers(page); - local_irq_save(flags); - bit_spin_lock(BH_Uptodate_Lock, &first->b_state); + flags = bh_uptodate_lock_irqsave(first); clear_buffer_async_read(bh); unlock_buffer(bh); tmp = bh; @@ -108,8 +107,7 @@ static void ntfs_end_buffer_async_read(s } tmp = tmp->b_this_page; } while (tmp != bh); - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); + bh_uptodate_unlock_irqrestore(first, flags); /* * If none of the buffers had errors then we can set the page uptodate, * but we first have to perform the post read mst fixups, if the @@ -142,9 +140,7 @@ static void ntfs_end_buffer_async_read(s unlock_page(page); return; still_busy: - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); - return; + bh_uptodate_unlock_irqrestore(first, flags); } /** --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -78,6 +78,22 @@ struct buffer_head { atomic_t b_count; /* users using this buffer_head */ }; +static inline unsigned long bh_uptodate_lock_irqsave(struct buffer_head *bh) +{ + unsigned long flags; + + local_irq_save(flags); + bit_spin_lock(BH_Uptodate_Lock, &bh->b_state); + return flags; +} + +static inline void +bh_uptodate_unlock_irqrestore(struct buffer_head *bh, unsigned long flags) +{ + bit_spin_unlock(BH_Uptodate_Lock, &bh->b_state); + local_irq_restore(flags); +} + /* * macro tricks to expand the set_buffer_foo(), clear_buffer_foo() * and buffer_foo() functions. From patchwork Thu Aug 1 01:01:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11069863 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D676D13A4 for ; Thu, 1 Aug 2019 01:16:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C933527F98 for ; Thu, 1 Aug 2019 01:16:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BC87528173; Thu, 1 Aug 2019 01:16:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A574127F98 for ; Thu, 1 Aug 2019 01:16:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729437AbfHABQj (ORCPT ); Wed, 31 Jul 2019 21:16:39 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:33544 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729398AbfHABQh (ORCPT ); Wed, 31 Jul 2019 21:16:37 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1hszhH-0002IA-Ks; Thu, 01 Aug 2019 03:15:59 +0200 Message-Id: <20190801010944.174039805@linutronix.de> User-Agent: quilt/0.65 Date: Thu, 01 Aug 2019 03:01:29 +0200 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Ingo Molnar , Sebastian Siewior , Anna-Maria Gleixner , Steven Rostedt , Julia Cartwright , Jan Kara , "Theodore Tso" , Alexander Viro , Matthew Wilcox , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, Jan Kara , Mark Fasheh , Joseph Qi , Joel Becker Subject: [patch V2 3/7] fs/buffer: Substitute BH_Uptodate_Lock for RT and bit spinlock debugging References: <20190801010126.245731659@linutronix.de> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Bit spinlocks are problematic if PREEMPT_RT is enabled. They disable preemption, which is undesired for latency reasons and breaks when regular spinlocks are taken within the bit_spinlock locked region because regular spinlocks are converted to 'sleeping spinlocks' on RT. Substitute the BH_Uptodate_Lock bit spinlock with a regular spinlock for PREEMPT_RT enabled kernels. Bit spinlocks are also not covered by lock debugging, e.g. lockdep. With the spinlock substitution in place, they can be exposed via CONFIG_DEBUG_BIT_SPINLOCKS. Signed-off-by: Thomas Gleixner Reviewed-by: Jan Kara Cc: "Theodore Ts'o" Cc: Alexander Viro Cc: Matthew Wilcox Cc: linux-fsdevel@vger.kernel.org --- V2: Collected Reviewed-by tag --- fs/buffer.c | 1 + include/linux/buffer_head.h | 31 +++++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+) --- a/fs/buffer.c +++ b/fs/buffer.c @@ -3360,6 +3360,7 @@ struct buffer_head *alloc_buffer_head(gf struct buffer_head *ret = kmem_cache_zalloc(bh_cachep, gfp_flags); if (ret) { INIT_LIST_HEAD(&ret->b_assoc_buffers); + buffer_head_init_locks(ret); preempt_disable(); __this_cpu_inc(bh_accounting.nr); recalc_bh_state(); --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -76,8 +76,35 @@ struct buffer_head { struct address_space *b_assoc_map; /* mapping this buffer is associated with */ atomic_t b_count; /* users using this buffer_head */ + +#if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_DEBUG_BIT_SPINLOCKS) + spinlock_t b_uptodate_lock; +#endif }; +#if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_DEBUG_BIT_SPINLOCKS) + +static inline unsigned long bh_uptodate_lock_irqsave(struct buffer_head *bh) +{ + unsigned long flags; + + spin_lock_irqsave(&bh->b_uptodate_lock, flags); + return flags; +} + +static inline void +bh_uptodate_unlock_irqrestore(struct buffer_head *bh, unsigned long flags) +{ + spin_unlock_irqrestore(&bh->b_uptodate_lock, flags); +} + +static inline void buffer_head_init_locks(struct buffer_head *bh) +{ + spin_lock_init(&bh->b_uptodate_lock); +} + +#else /* PREEMPT_RT || DEBUG_BIT_SPINLOCKS */ + static inline unsigned long bh_uptodate_lock_irqsave(struct buffer_head *bh) { unsigned long flags; @@ -94,6 +121,10 @@ bh_uptodate_unlock_irqrestore(struct buf local_irq_restore(flags); } +static inline void buffer_head_init_locks(struct buffer_head *bh) { } + +#endif /* !PREEMPT_RT && !DEBUG_BIT_SPINLOCKS */ + /* * macro tricks to expand the set_buffer_foo(), clear_buffer_foo() * and buffer_foo() functions. From patchwork Thu Aug 1 01:01:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11069875 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 61BD8174A for ; Thu, 1 Aug 2019 01:17:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 55B3927F98 for ; Thu, 1 Aug 2019 01:17:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 494EF28173; Thu, 1 Aug 2019 01:17:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 287EF27F98 for ; Thu, 1 Aug 2019 01:17:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729683AbfHABRD (ORCPT ); Wed, 31 Jul 2019 21:17:03 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:33583 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729531AbfHABRC (ORCPT ); Wed, 31 Jul 2019 21:17:02 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1hszhI-0002ID-DM; Thu, 01 Aug 2019 03:16:00 +0200 Message-Id: <20190801010944.268628059@linutronix.de> User-Agent: quilt/0.65 Date: Thu, 01 Aug 2019 03:01:30 +0200 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Ingo Molnar , Sebastian Siewior , Anna-Maria Gleixner , Steven Rostedt , Julia Cartwright , linux-ext4@vger.kernel.org, "Theodore Tso" , Jan Kara , Jan Kara , Matthew Wilcox , Alexander Viro , linux-fsdevel@vger.kernel.org, Mark Fasheh , Joseph Qi , Joel Becker Subject: [patch V2 4/7] fs/jbd2: Remove jbd_trylock_bh_state() References: <20190801010126.245731659@linutronix.de> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP No users. Signed-off-by: Thomas Gleixner Cc: linux-ext4@vger.kernel.org Cc: "Theodore Ts'o" Cc: Jan Kara Reviewed-by: Jan Kara --- include/linux/jbd2.h | 5 ----- 1 file changed, 5 deletions(-) --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -347,11 +347,6 @@ static inline void jbd_lock_bh_state(str bit_spin_lock(BH_State, &bh->b_state); } -static inline int jbd_trylock_bh_state(struct buffer_head *bh) -{ - return bit_spin_trylock(BH_State, &bh->b_state); -} - static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { return bit_spin_is_locked(BH_State, &bh->b_state); From patchwork Thu Aug 1 01:01:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11069869 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ECE22174A for ; Thu, 1 Aug 2019 01:16:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E061F27F98 for ; Thu, 1 Aug 2019 01:16:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D442928173; Thu, 1 Aug 2019 01:16:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 768C327F98 for ; Thu, 1 Aug 2019 01:16:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729611AbfHABQw (ORCPT ); Wed, 31 Jul 2019 21:16:52 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:33568 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729398AbfHABQq (ORCPT ); Wed, 31 Jul 2019 21:16:46 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1hszhJ-0002IH-9s; Thu, 01 Aug 2019 03:16:02 +0200 Message-Id: <20190801010944.364767635@linutronix.de> User-Agent: quilt/0.65 Date: Thu, 01 Aug 2019 03:01:31 +0200 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Ingo Molnar , Sebastian Siewior , Anna-Maria Gleixner , Steven Rostedt , Julia Cartwright , linux-ext4@vger.kernel.org, "Theodore Tso" , Jan Kara , Jan Kara , Matthew Wilcox , Alexander Viro , linux-fsdevel@vger.kernel.org, Mark Fasheh , Joseph Qi , Joel Becker Subject: [patch V2 5/7] fs/jbd2: Simplify journal_unmap_buffer() References: <20190801010126.245731659@linutronix.de> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP journal_unmap_buffer() checks first whether the buffer head is a journal. If so it takes locks and then invokes jbd2_journal_grab_journal_head() followed by another check whether this is journal head buffer. The double checking is pointless. Replace the initial check with jbd2_journal_grab_journal_head() which alredy checks whether the buffer head is actually a journal. Allows also early access to the journal head pointer for the upcoming conversion of state lock to a regular spinlock. Signed-off-by: Thomas Gleixner Cc: linux-ext4@vger.kernel.org Cc: "Theodore Ts'o" Cc: Jan Kara Reviewed-by: Jan Kara --- V2: New patch --- fs/jbd2/transaction.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -2196,7 +2196,8 @@ static int journal_unmap_buffer(journal_ * holding the page lock. --sct */ - if (!buffer_jbd(bh)) + jh = jbd2_journal_grab_journal_head(bh); + if (!jh) goto zap_buffer_unlocked; /* OK, we have data buffer in journaled mode */ @@ -2204,10 +2205,6 @@ static int journal_unmap_buffer(journal_ jbd_lock_bh_state(bh); spin_lock(&journal->j_list_lock); - jh = jbd2_journal_grab_journal_head(bh); - if (!jh) - goto zap_buffer_no_jh; - /* * We cannot remove the buffer from checkpoint lists until the * transaction adding inode to orphan list (let's call it T) @@ -2329,7 +2326,6 @@ static int journal_unmap_buffer(journal_ */ jh->b_modified = 0; jbd2_journal_put_journal_head(jh); -zap_buffer_no_jh: spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); write_unlock(&journal->j_state_lock); From patchwork Thu Aug 1 01:01:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11069871 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C350174A for ; Thu, 1 Aug 2019 01:17:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F183227F98 for ; Thu, 1 Aug 2019 01:17:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E51A228173; Thu, 1 Aug 2019 01:17:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B60127F98 for ; Thu, 1 Aug 2019 01:17:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729637AbfHABQ6 (ORCPT ); Wed, 31 Jul 2019 21:16:58 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:33571 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729531AbfHABQu (ORCPT ); Wed, 31 Jul 2019 21:16:50 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1hszhL-0002IK-Em; Thu, 01 Aug 2019 03:16:03 +0200 Message-Id: <20190801010944.457499601@linutronix.de> User-Agent: quilt/0.65 Date: Thu, 01 Aug 2019 03:01:32 +0200 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Ingo Molnar , Sebastian Siewior , Anna-Maria Gleixner , Steven Rostedt , Julia Cartwright , Jan Kara , "Theodore Tso" , Mark Fasheh , Joseph Qi , Joel Becker , linux-ext4@vger.kernel.org, Jan Kara , Matthew Wilcox , Alexander Viro , linux-fsdevel@vger.kernel.org Subject: [patch V2 6/7] fs/jbd2: Make state lock a spinlock References: <20190801010126.245731659@linutronix.de> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Bit-spinlocks are problematic on PREEMPT_RT if functions which might sleep on RT, e.g. spin_lock(), alloc/free(), are invoked inside the lock held region because bit spinlocks disable preemption even on RT. A first attempt was to replace state lock with a spinlock placed in struct buffer_head and make the locking conditional on PREEMPT_RT and DEBUG_BIT_SPINLOCKS. Jan pointed out that there is a 4 byte hole in struct journal_head where a regular spinlock fits in and he would not object to convert the state lock to a spinlock unconditionally. Aside of solving the RT problem, this also gains lockdep coverage for the journal head state lock (bit-spinlocks are not covered by lockdep as it's hard to fit a lockdep map into a single bit). The trivial change would have been to convert the jbd_*lock_bh_state() inlines, but that comes with the downside that these functions take a buffer head pointer which needs to be converted to a journal head pointer which adds another level of indirection. As almost all functions which use this lock have a journal head pointer readily available, it makes more sense to remove the lock helper inlines and write out spin_*lock() at all call sites. Fixup all locking comments as well. Suggested-by: Jan Kara Signed-off-by: Thomas Gleixner Cc: "Theodore Ts'o" Cc: Mark Fasheh Cc: Joseph Qi Cc: Joel Becker Cc: Jan Kara Cc: linux-ext4@vger.kernel.org --- V2: New patch --- fs/jbd2/commit.c | 8 ++-- fs/jbd2/journal.c | 10 +++-- fs/jbd2/transaction.c | 83 +++++++++++++++++++++---------------------- fs/ocfs2/suballoc.c | 19 +++++---- include/linux/jbd2.h | 20 +--------- include/linux/journal-head.h | 19 ++++++--- 6 files changed, 77 insertions(+), 82 deletions(-) --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -482,10 +482,10 @@ void jbd2_journal_commit_transaction(jou if (jh->b_committed_data) { struct buffer_head *bh = jh2bh(jh); - jbd_lock_bh_state(bh); + spin_lock(&jh->state_lock); jbd2_free(jh->b_committed_data, bh->b_size); jh->b_committed_data = NULL; - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); } jbd2_journal_refile_buffer(journal, jh); } @@ -927,7 +927,7 @@ void jbd2_journal_commit_transaction(jou * done with it. */ get_bh(bh); - jbd_lock_bh_state(bh); + spin_lock(&jh->state_lock); J_ASSERT_JH(jh, jh->b_transaction == commit_transaction); /* @@ -1023,7 +1023,7 @@ void jbd2_journal_commit_transaction(jou } JBUFFER_TRACE(jh, "refile or unfile buffer"); __jbd2_journal_refile_buffer(jh); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); if (try_to_free) release_buffer_page(bh); /* Drops bh reference */ else --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -365,7 +365,7 @@ int jbd2_journal_write_metadata_buffer(t /* keep subsequent assertions sane */ atomic_set(&new_bh->b_count, 1); - jbd_lock_bh_state(bh_in); + spin_lock(&jh_in->state_lock); repeat: /* * If a new transaction has already done a buffer copy-out, then @@ -407,13 +407,13 @@ int jbd2_journal_write_metadata_buffer(t if (need_copy_out && !done_copy_out) { char *tmp; - jbd_unlock_bh_state(bh_in); + spin_unlock(&jh_in->state_lock); tmp = jbd2_alloc(bh_in->b_size, GFP_NOFS); if (!tmp) { brelse(new_bh); return -ENOMEM; } - jbd_lock_bh_state(bh_in); + spin_lock(&jh_in->state_lock); if (jh_in->b_frozen_data) { jbd2_free(tmp, bh_in->b_size); goto repeat; @@ -466,7 +466,7 @@ int jbd2_journal_write_metadata_buffer(t __jbd2_journal_file_buffer(jh_in, transaction, BJ_Shadow); spin_unlock(&journal->j_list_lock); set_buffer_shadow(bh_in); - jbd_unlock_bh_state(bh_in); + spin_unlock(&jh_in->state_lock); return do_escape | (done_copy_out << 1); } @@ -2412,6 +2412,8 @@ static struct journal_head *journal_allo ret = kmem_cache_zalloc(jbd2_journal_head_cache, GFP_NOFS | __GFP_NOFAIL); } + if (ret) + spin_lock_init(&ret->state_lock); return ret; } --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -876,7 +876,7 @@ do_get_write_access(handle_t *handle, st start_lock = jiffies; lock_buffer(bh); - jbd_lock_bh_state(bh); + spin_lock(&jh->state_lock); /* If it takes too long to lock the buffer, trace it */ time_lock = jbd2_time_diff(start_lock, jiffies); @@ -926,7 +926,7 @@ do_get_write_access(handle_t *handle, st error = -EROFS; if (is_handle_aborted(handle)) { - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); goto out; } error = 0; @@ -990,7 +990,7 @@ do_get_write_access(handle_t *handle, st */ if (buffer_shadow(bh)) { JBUFFER_TRACE(jh, "on shadow: sleep"); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); wait_on_bit_io(&bh->b_state, BH_Shadow, TASK_UNINTERRUPTIBLE); goto repeat; } @@ -1011,7 +1011,7 @@ do_get_write_access(handle_t *handle, st JBUFFER_TRACE(jh, "generate frozen data"); if (!frozen_buffer) { JBUFFER_TRACE(jh, "allocate memory for buffer"); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); frozen_buffer = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS | __GFP_NOFAIL); goto repeat; @@ -1030,7 +1030,7 @@ do_get_write_access(handle_t *handle, st jh->b_next_transaction = transaction; done: - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); /* * If we are about to journal a buffer, then any revoke pending on it is @@ -1169,7 +1169,7 @@ int jbd2_journal_get_create_access(handl * that case: the transaction must have deleted the buffer for it to be * reused here. */ - jbd_lock_bh_state(bh); + spin_lock(&jh->state_lock); J_ASSERT_JH(jh, (jh->b_transaction == transaction || jh->b_transaction == NULL || (jh->b_transaction == journal->j_committing_transaction && @@ -1204,7 +1204,7 @@ int jbd2_journal_get_create_access(handl jh->b_next_transaction = transaction; spin_unlock(&journal->j_list_lock); } - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); /* * akpm: I added this. ext3_alloc_branch can pick up new indirect @@ -1272,13 +1272,13 @@ int jbd2_journal_get_undo_access(handle_ committed_data = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS|__GFP_NOFAIL); - jbd_lock_bh_state(bh); + spin_lock(&jh->state_lock); if (!jh->b_committed_data) { /* Copy out the current buffer contents into the * preserved, committed copy. */ JBUFFER_TRACE(jh, "generate b_committed data"); if (!committed_data) { - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); goto repeat; } @@ -1286,7 +1286,7 @@ int jbd2_journal_get_undo_access(handle_ committed_data = NULL; memcpy(jh->b_committed_data, bh->b_data, bh->b_size); } - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); out: jbd2_journal_put_journal_head(jh); if (unlikely(committed_data)) @@ -1387,16 +1387,16 @@ int jbd2_journal_dirty_metadata(handle_t */ if (jh->b_transaction != transaction && jh->b_next_transaction != transaction) { - jbd_lock_bh_state(bh); + spin_lock(&jh->state_lock); J_ASSERT_JH(jh, jh->b_transaction == transaction || jh->b_next_transaction == transaction); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); } if (jh->b_modified == 1) { /* If it's in our transaction it must be in BJ_Metadata list. */ if (jh->b_transaction == transaction && jh->b_jlist != BJ_Metadata) { - jbd_lock_bh_state(bh); + spin_lock(&jh->state_lock); if (jh->b_transaction == transaction && jh->b_jlist != BJ_Metadata) pr_err("JBD2: assertion failure: h_type=%u " @@ -1406,13 +1406,13 @@ int jbd2_journal_dirty_metadata(handle_t jh->b_jlist); J_ASSERT_JH(jh, jh->b_transaction != transaction || jh->b_jlist == BJ_Metadata); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); } goto out; } journal = transaction->t_journal; - jbd_lock_bh_state(bh); + spin_lock(&jh->state_lock); if (jh->b_modified == 0) { /* @@ -1498,7 +1498,7 @@ int jbd2_journal_dirty_metadata(handle_t __jbd2_journal_file_buffer(jh, transaction, BJ_Metadata); spin_unlock(&journal->j_list_lock); out_unlock_bh: - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); out: JBUFFER_TRACE(jh, "exit"); return ret; @@ -1536,18 +1536,18 @@ int jbd2_journal_forget (handle_t *handl BUFFER_TRACE(bh, "entry"); - jbd_lock_bh_state(bh); - if (!buffer_jbd(bh)) goto not_jbd; + jh = bh2jh(bh); + spin_lock(&jh->state_lock); /* Critical error: attempting to delete a bitmap buffer, maybe? * Don't do any jbd operations, and return an error. */ if (!J_EXPECT_JH(jh, !jh->b_committed_data, "inconsistent data on disk")) { err = -EIO; - goto not_jbd; + goto bad_jbd; } /* keep track of whether or not this transaction modified us */ @@ -1664,7 +1664,7 @@ int jbd2_journal_forget (handle_t *handl spin_unlock(&journal->j_list_lock); } - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); __brelse(bh); drop: if (drop_reserve) { @@ -1673,8 +1673,9 @@ int jbd2_journal_forget (handle_t *handl } return err; +bad_jbd: + spin_unlock(&jh->state_lock); not_jbd: - jbd_unlock_bh_state(bh); __bforget(bh); goto drop; } @@ -1875,7 +1876,7 @@ int jbd2_journal_stop(handle_t *handle) * * j_list_lock is held. * - * jbd_lock_bh_state(jh2bh(jh)) is held. + * jh->state_lock is held. */ static inline void @@ -1899,7 +1900,7 @@ static inline void * * Called with j_list_lock held, and the journal may not be locked. * - * jbd_lock_bh_state(jh2bh(jh)) is held. + * jh->state_lock is held. */ static inline void @@ -1931,7 +1932,7 @@ static void __jbd2_journal_temp_unlink_b transaction_t *transaction; struct buffer_head *bh = jh2bh(jh); - J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh)); + assert_spin_locked(&jh->state_lock); transaction = jh->b_transaction; if (transaction) assert_spin_locked(&transaction->t_journal->j_list_lock); @@ -1987,18 +1988,18 @@ void jbd2_journal_unfile_buffer(journal_ /* Get reference so that buffer cannot be freed before we unlock it */ get_bh(bh); - jbd_lock_bh_state(bh); + spin_lock(&jh->state_lock); spin_lock(&journal->j_list_lock); __jbd2_journal_unfile_buffer(jh); spin_unlock(&journal->j_list_lock); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); __brelse(bh); } /* * Called from jbd2_journal_try_to_free_buffers(). * - * Called under jbd_lock_bh_state(bh) + * Called under spin_lock(&jh->state_lock) */ static void __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh) @@ -2085,10 +2086,10 @@ int jbd2_journal_try_to_free_buffers(jou if (!jh) continue; - jbd_lock_bh_state(bh); + spin_lock(&jh->state_lock); __journal_try_to_free_buffer(journal, bh); jbd2_journal_put_journal_head(jh); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); if (buffer_jbd(bh)) goto busy; } while ((bh = bh->b_this_page) != head); @@ -2109,7 +2110,7 @@ int jbd2_journal_try_to_free_buffers(jou * * Called under j_list_lock. * - * Called under jbd_lock_bh_state(bh). + * Called under spin_lock(&jh->state_lock). */ static int __dispose_buffer(struct journal_head *jh, transaction_t *transaction) { @@ -2202,7 +2203,7 @@ static int journal_unmap_buffer(journal_ /* OK, we have data buffer in journaled mode */ write_lock(&journal->j_state_lock); - jbd_lock_bh_state(bh); + spin_lock(&jh->state_lock); spin_lock(&journal->j_list_lock); /* @@ -2285,7 +2286,7 @@ static int journal_unmap_buffer(journal_ if (partial_page) { jbd2_journal_put_journal_head(jh); spin_unlock(&journal->j_list_lock); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); write_unlock(&journal->j_state_lock); return -EBUSY; } @@ -2300,7 +2301,7 @@ static int journal_unmap_buffer(journal_ jh->b_next_transaction = journal->j_running_transaction; jbd2_journal_put_journal_head(jh); spin_unlock(&journal->j_list_lock); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); write_unlock(&journal->j_state_lock); return 0; } else { @@ -2327,7 +2328,7 @@ static int journal_unmap_buffer(journal_ jh->b_modified = 0; jbd2_journal_put_journal_head(jh); spin_unlock(&journal->j_list_lock); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); write_unlock(&journal->j_state_lock); zap_buffer_unlocked: clear_buffer_dirty(bh); @@ -2415,7 +2416,7 @@ void __jbd2_journal_file_buffer(struct j int was_dirty = 0; struct buffer_head *bh = jh2bh(jh); - J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh)); + assert_spin_locked(&jh->state_lock); assert_spin_locked(&transaction->t_journal->j_list_lock); J_ASSERT_JH(jh, jh->b_jlist < BJ_Types); @@ -2477,11 +2478,11 @@ void __jbd2_journal_file_buffer(struct j void jbd2_journal_file_buffer(struct journal_head *jh, transaction_t *transaction, int jlist) { - jbd_lock_bh_state(jh2bh(jh)); + spin_lock(&jh->state_lock); spin_lock(&transaction->t_journal->j_list_lock); __jbd2_journal_file_buffer(jh, transaction, jlist); spin_unlock(&transaction->t_journal->j_list_lock); - jbd_unlock_bh_state(jh2bh(jh)); + spin_unlock(&jh->state_lock); } /* @@ -2491,7 +2492,7 @@ void jbd2_journal_file_buffer(struct jou * buffer on that transaction's metadata list. * * Called under j_list_lock - * Called under jbd_lock_bh_state(jh2bh(jh)) + * Called under jh->state_lock * * jh and bh may be already free when this function returns */ @@ -2500,7 +2501,7 @@ void __jbd2_journal_refile_buffer(struct int was_dirty, jlist; struct buffer_head *bh = jh2bh(jh); - J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh)); + assert_spin_locked(&jh->state_lock); if (jh->b_transaction) assert_spin_locked(&jh->b_transaction->t_journal->j_list_lock); @@ -2549,10 +2550,10 @@ void jbd2_journal_refile_buffer(journal_ /* Get reference so that buffer cannot be freed before we unlock it */ get_bh(bh); - jbd_lock_bh_state(bh); + spin_lock(&jh->state_lock); spin_lock(&journal->j_list_lock); __jbd2_journal_refile_buffer(jh); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->state_lock); spin_unlock(&journal->j_list_lock); __brelse(bh); } --- a/fs/ocfs2/suballoc.c +++ b/fs/ocfs2/suballoc.c @@ -1252,6 +1252,7 @@ static int ocfs2_test_bg_bit_allocatable int nr) { struct ocfs2_group_desc *bg = (struct ocfs2_group_desc *) bg_bh->b_data; + struct journal_head *jh; int ret; if (ocfs2_test_bit(nr, (unsigned long *)bg->bg_bitmap)) @@ -1260,13 +1261,14 @@ static int ocfs2_test_bg_bit_allocatable if (!buffer_jbd(bg_bh)) return 1; - jbd_lock_bh_state(bg_bh); - bg = (struct ocfs2_group_desc *) bh2jh(bg_bh)->b_committed_data; + jh = bh2jh(bg_bh); + spin_lock(&jh->state_lock); + bg = (struct ocfs2_group_desc *) jh->b_committed_data; if (bg) ret = !ocfs2_test_bit(nr, (unsigned long *)bg->bg_bitmap); else ret = 1; - jbd_unlock_bh_state(bg_bh); + spin_unlock(&jh->state_lock); return ret; } @@ -2387,6 +2389,7 @@ static int ocfs2_block_group_clear_bits( int status; unsigned int tmp; struct ocfs2_group_desc *undo_bg = NULL; + struct journal_head *jh; /* The caller got this descriptor from * ocfs2_read_group_descriptor(). Any corruption is a code bug. */ @@ -2405,10 +2408,10 @@ static int ocfs2_block_group_clear_bits( goto bail; } + jh = bh2jh(group_bh); if (undo_fn) { - jbd_lock_bh_state(group_bh); - undo_bg = (struct ocfs2_group_desc *) - bh2jh(group_bh)->b_committed_data; + spin_lock(&jh->state_lock); + undo_bg = (struct ocfs2_group_desc *) jh->b_committed_data; BUG_ON(!undo_bg); } @@ -2423,7 +2426,7 @@ static int ocfs2_block_group_clear_bits( le16_add_cpu(&bg->bg_free_bits_count, num_bits); if (le16_to_cpu(bg->bg_free_bits_count) > le16_to_cpu(bg->bg_bits)) { if (undo_fn) - jbd_unlock_bh_state(group_bh); + spin_unlock(&jh->state_lock); return ocfs2_error(alloc_inode->i_sb, "Group descriptor # %llu has bit count %u but claims %u are freed. num_bits %d\n", (unsigned long long)le64_to_cpu(bg->bg_blkno), le16_to_cpu(bg->bg_bits), @@ -2432,7 +2435,7 @@ static int ocfs2_block_group_clear_bits( } if (undo_fn) - jbd_unlock_bh_state(group_bh); + spin_unlock(&jh->state_lock); ocfs2_journal_dirty(handle, group_bh); bail: --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -313,7 +313,6 @@ enum jbd_state_bits { BH_Revoked, /* Has been revoked from the log */ BH_RevokeValid, /* Revoked flag is valid */ BH_JBDDirty, /* Is dirty but journaled */ - BH_State, /* Pins most journal_head state */ BH_JournalHead, /* Pins bh->b_private and jh->b_bh */ BH_Shadow, /* IO on shadow buffer is running */ BH_Verified, /* Metadata block has been verified ok */ @@ -342,21 +341,6 @@ static inline struct journal_head *bh2jh return bh->b_private; } -static inline void jbd_lock_bh_state(struct buffer_head *bh) -{ - bit_spin_lock(BH_State, &bh->b_state); -} - -static inline int jbd_is_locked_bh_state(struct buffer_head *bh) -{ - return bit_spin_is_locked(BH_State, &bh->b_state); -} - -static inline void jbd_unlock_bh_state(struct buffer_head *bh) -{ - bit_spin_unlock(BH_State, &bh->b_state); -} - static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { bit_spin_lock(BH_JournalHead, &bh->b_state); @@ -551,9 +535,9 @@ struct transaction_chp_stats_s { * ->jbd_lock_bh_journal_head() (This is "innermost") * * j_state_lock - * ->jbd_lock_bh_state() + * ->jh->state_lock * - * jbd_lock_bh_state() + * jh->state_lock * ->j_list_lock * * j_state_lock --- a/include/linux/journal-head.h +++ b/include/linux/journal-head.h @@ -24,13 +24,18 @@ struct journal_head { struct buffer_head *b_bh; /* + * Protect the buffer head state + */ + spinlock_t state_lock; + + /* * Reference count - see description in journal.c * [jbd_lock_bh_journal_head()] */ int b_jcount; /* - * Journalling list for this buffer [jbd_lock_bh_state()] + * Journalling list for this buffer [jh->state_lock] * NOTE: We *cannot* combine this with b_modified into a bitfield * as gcc would then (which the C standard allows but which is * very unuseful) make 64-bit accesses to the bitfield and clobber @@ -41,20 +46,20 @@ struct journal_head { /* * This flag signals the buffer has been modified by * the currently running transaction - * [jbd_lock_bh_state()] + * [jh->state_lock] */ unsigned b_modified; /* * Copy of the buffer data frozen for writing to the log. - * [jbd_lock_bh_state()] + * [jh->state_lock] */ char *b_frozen_data; /* * Pointer to a saved copy of the buffer containing no uncommitted * deallocation references, so that allocations can avoid overwriting - * uncommitted deletes. [jbd_lock_bh_state()] + * uncommitted deletes. [jh->state_lock] */ char *b_committed_data; @@ -63,7 +68,7 @@ struct journal_head { * metadata: either the running transaction or the committing * transaction (if there is one). Only applies to buffers on a * transaction's data or metadata journaling list. - * [j_list_lock] [jbd_lock_bh_state()] + * [j_list_lock] [jh->state_lock] * Either of these locks is enough for reading, both are needed for * changes. */ @@ -73,13 +78,13 @@ struct journal_head { * Pointer to the running compound transaction which is currently * modifying the buffer's metadata, if there was already a transaction * committing it when the new transaction touched it. - * [t_list_lock] [jbd_lock_bh_state()] + * [t_list_lock] [jh->state_lock] */ transaction_t *b_next_transaction; /* * Doubly-linked list of buffers on a transaction's data, metadata or - * forget queue. [t_list_lock] [jbd_lock_bh_state()] + * forget queue. [t_list_lock] [jh->state_lock] */ struct journal_head *b_tnext, *b_tprev; From patchwork Thu Aug 1 01:01:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11069873 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4E8AF186E for ; Thu, 1 Aug 2019 01:17:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4172E27F98 for ; Thu, 1 Aug 2019 01:17:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 34CDB28173; Thu, 1 Aug 2019 01:17:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C26C527F98 for ; Thu, 1 Aug 2019 01:17:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729700AbfHABRD (ORCPT ); Wed, 31 Jul 2019 21:17:03 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:33579 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729636AbfHABQ4 (ORCPT ); Wed, 31 Jul 2019 21:16:56 -0400 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1hszhM-0002IO-FR; Thu, 01 Aug 2019 03:16:04 +0200 Message-Id: <20190801010944.549462805@linutronix.de> User-Agent: quilt/0.65 Date: Thu, 01 Aug 2019 03:01:33 +0200 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Ingo Molnar , Sebastian Siewior , Anna-Maria Gleixner , Steven Rostedt , Julia Cartwright , Jan Kara , linux-ext4@vger.kernel.org, "Theodore Tso" , Jan Kara , Matthew Wilcox , Alexander Viro , linux-fsdevel@vger.kernel.org, Mark Fasheh , Joseph Qi , Joel Becker Subject: [patch V2 7/7] fs/jbd2: Free journal head outside of locked region References: <20190801010126.245731659@linutronix.de> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On PREEMPT_RT bit-spinlocks have the same semantics as on PREEMPT_RT=n, i.e. they disable preemption. That means functions which are not safe to be called in preempt disabled context on RT trigger a might_sleep() assert. The journal head bit spinlock is mostly held for short code sequences with trivial RT safe functionality, except for one place: jbd2_journal_put_journal_head() invokes __journal_remove_journal_head() with the journal head bit spinlock held. __journal_remove_journal_head() invokes kmem_cache_free() which must not be called with preemption disabled on RT. Jan suggested to rework the removal function so the actual free happens outside the bit-spinlocked region. Split it into two parts: - Do the sanity checks and the buffer head detach under the lock - Do the actual free after dropping the lock There is error case handling in the free part which needs to dereference the b_size field of the now detached buffer head. Due to paranoia (caused by ignorance) the size is retrieved in the detach function and handed into the free function. Might be over-engineered, but better safe than sorry. This makes the journal head bit-spinlock usage RT compliant and also avoids nested locking which is not covered by lockdep. Suggested-by: Jan Kara Signed-off-by: Thomas Gleixner Cc: linux-ext4@vger.kernel.org Cc: "Theodore Ts'o" Cc: Jan Kara --- V2: New patch --- fs/jbd2/journal.c | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-) --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -2521,9 +2521,10 @@ struct journal_head *jbd2_journal_grab_j return jh; } -static void __journal_remove_journal_head(struct buffer_head *bh) +static size_t __journal_remove_journal_head(struct buffer_head *bh) { struct journal_head *jh = bh2jh(bh); + size_t b_size = READ_ONCE(bh->b_size); J_ASSERT_JH(jh, jh->b_jcount >= 0); J_ASSERT_JH(jh, jh->b_transaction == NULL); @@ -2533,17 +2534,25 @@ static void __journal_remove_journal_hea J_ASSERT_BH(bh, buffer_jbd(bh)); J_ASSERT_BH(bh, jh2bh(jh) == bh); BUFFER_TRACE(bh, "remove journal_head"); + + /* Unlink before dropping the lock */ + bh->b_private = NULL; + jh->b_bh = NULL; /* debug, really */ + clear_buffer_jbd(bh); + + return b_size; +} + +static void journal_release_journal_head(struct journal_head *jh, size_t b_size) +{ if (jh->b_frozen_data) { printk(KERN_WARNING "%s: freeing b_frozen_data\n", __func__); - jbd2_free(jh->b_frozen_data, bh->b_size); + jbd2_free(jh->b_frozen_data, b_size); } if (jh->b_committed_data) { printk(KERN_WARNING "%s: freeing b_committed_data\n", __func__); - jbd2_free(jh->b_committed_data, bh->b_size); + jbd2_free(jh->b_committed_data, b_size); } - bh->b_private = NULL; - jh->b_bh = NULL; /* debug, really */ - clear_buffer_jbd(bh); journal_free_journal_head(jh); } @@ -2559,11 +2568,14 @@ void jbd2_journal_put_journal_head(struc J_ASSERT_JH(jh, jh->b_jcount > 0); --jh->b_jcount; if (!jh->b_jcount) { - __journal_remove_journal_head(bh); + size_t b_size = __journal_remove_journal_head(bh); + jbd_unlock_bh_journal_head(bh); + journal_release_journal_head(jh, b_size); __brelse(bh); - } else + } else { jbd_unlock_bh_journal_head(bh); + } } /*