From patchwork Wed Apr 14 13:47:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 12202807 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA325C43460 for ; Wed, 14 Apr 2021 13:39:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8C21C611CC for ; Wed, 14 Apr 2021 13:39:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351415AbhDNNkA (ORCPT ); Wed, 14 Apr 2021 09:40:00 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:15679 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346798AbhDNNj5 (ORCPT ); Wed, 14 Apr 2021 09:39:57 -0400 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4FL3RC4z3BzpXtL; Wed, 14 Apr 2021 21:36:39 +0800 (CST) Received: from huawei.com (10.175.127.227) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.498.0; Wed, 14 Apr 2021 21:39:27 +0800 From: Zhang Yi To: CC: , , , , , Subject: [RFC PATCH v2 1/7] jbd2: remove the out label in __jbd2_journal_remove_checkpoint() Date: Wed, 14 Apr 2021 21:47:31 +0800 Message-ID: <20210414134737.2366971-2-yi.zhang@huawei.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20210414134737.2366971-1-yi.zhang@huawei.com> References: <20210414134737.2366971-1-yi.zhang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The 'out' lable just return the 'ret' value and seems not required, so remove this label and switch to return appropriate value immediately. This patch also do some minor cleanup, no logical change. Signed-off-by: Zhang Yi Reviewed-by: Jan Kara --- fs/jbd2/checkpoint.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index 63b526d44886..bf5511d19ac5 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -564,13 +564,13 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh) struct transaction_chp_stats_s *stats; transaction_t *transaction; journal_t *journal; - int ret = 0; JBUFFER_TRACE(jh, "entry"); - if ((transaction = jh->b_cp_transaction) == NULL) { + transaction = jh->b_cp_transaction; + if (!transaction) { JBUFFER_TRACE(jh, "not on transaction"); - goto out; + return 0; } journal = transaction->t_journal; @@ -579,9 +579,9 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh) jh->b_cp_transaction = NULL; jbd2_journal_put_journal_head(jh); - if (transaction->t_checkpoint_list != NULL || - transaction->t_checkpoint_io_list != NULL) - goto out; + /* Is this transaction empty? */ + if (transaction->t_checkpoint_list || transaction->t_checkpoint_io_list) + return 0; /* * There is one special case to worry about: if we have just pulled the @@ -593,10 +593,12 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh) * See the comment at the end of jbd2_journal_commit_transaction(). */ if (transaction->t_state != T_FINISHED) - goto out; + return 0; - /* OK, that was the last buffer for the transaction: we can now - safely remove this transaction from the log */ + /* + * OK, that was the last buffer for the transaction, we can now + * safely remove this transaction from the log. + */ stats = &transaction->t_chp_stats; if (stats->cs_chp_time) stats->cs_chp_time = jbd2_time_diff(stats->cs_chp_time, @@ -606,9 +608,7 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh) __jbd2_journal_drop_transaction(journal, transaction); jbd2_journal_free_transaction(transaction); - ret = 1; -out: - return ret; + return 1; } /* From patchwork Wed Apr 14 13:47:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 12202815 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A264C43603 for ; Wed, 14 Apr 2021 13:39:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 55D5A611AD for ; Wed, 14 Apr 2021 13:39:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351430AbhDNNkF (ORCPT ); Wed, 14 Apr 2021 09:40:05 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:16917 "EHLO szxga06-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346798AbhDNNkC (ORCPT ); Wed, 14 Apr 2021 09:40:02 -0400 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4FL3SV0GztzjZRp; Wed, 14 Apr 2021 21:37:46 +0800 (CST) Received: from huawei.com (10.175.127.227) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.498.0; Wed, 14 Apr 2021 21:39:27 +0800 From: Zhang Yi To: CC: , , , , , Subject: [RFC PATCH v2 2/7] jbd2: ensure abort the journal if detect IO error when writing original buffer back Date: Wed, 14 Apr 2021 21:47:32 +0800 Message-ID: <20210414134737.2366971-3-yi.zhang@huawei.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20210414134737.2366971-1-yi.zhang@huawei.com> References: <20210414134737.2366971-1-yi.zhang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Althourh we merged ("jbd2: abort journal if free a async write error metadata buffer"), there is a race between jbd2_journal_try_to_free_buffers() and jbd2_journal_destroy(), so the jbd2_log_do_checkpoint() may still missing to detect the buffer write io error flag and will end up leading to filesystem inconsistency. jbd2_journal_try_to_free_buffers() ext4_put_super() jbd2_journal_destroy() __jbd2_journal_remove_checkpoint() detect buffer write error jbd2_log_do_checkpoint() jbd2_cleanup_journal_tail() <--- lead to inconsistency jbd2_journal_abort() Fix this issue by introduce a new mark j_checkpoint_io_error and set it in __jbd2_journal_remove_checkpoint() when freeing a checkpoint buffer has write_io_error flag. Then jbd2_journal_destroy() will detect this mark and abort the journal to prevent updating log tail. Signed-off-by: Zhang Yi --- fs/jbd2/checkpoint.c | 32 +++++++++++++++++++------------- fs/jbd2/journal.c | 9 +++++++++ include/linux/jbd2.h | 7 +++++++ 3 files changed, 35 insertions(+), 13 deletions(-) diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index bf5511d19ac5..ff3d38ad3a1e 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -221,14 +221,15 @@ int jbd2_log_do_checkpoint(journal_t *journal) result = jbd2_cleanup_journal_tail(journal); trace_jbd2_checkpoint(journal, result); jbd_debug(1, "cleanup_journal_tail returned %d\n", result); - if (result <= 0) + if (result == 0) return result; + if (result < 0) + goto error; /* * OK, we need to start writing disk blocks. Take one transaction * and write it. */ - result = 0; spin_lock(&journal->j_list_lock); if (!journal->j_checkpoint_transactions) goto out; @@ -295,8 +296,6 @@ int jbd2_log_do_checkpoint(journal_t *journal) goto restart; } if (!buffer_dirty(bh)) { - if (unlikely(buffer_write_io_error(bh)) && !result) - result = -EIO; BUFFER_TRACE(bh, "remove from checkpoint"); if (__jbd2_journal_remove_checkpoint(jh)) /* The transaction was released; we're done */ @@ -356,9 +355,6 @@ int jbd2_log_do_checkpoint(journal_t *journal) spin_lock(&journal->j_list_lock); goto restart2; } - if (unlikely(buffer_write_io_error(bh)) && !result) - result = -EIO; - /* * Now in whatever state the buffer currently is, we * know that it has been written out and so we can @@ -369,12 +365,13 @@ int jbd2_log_do_checkpoint(journal_t *journal) } out: spin_unlock(&journal->j_list_lock); - if (result < 0) + result = jbd2_cleanup_journal_tail(journal); + if (result >= 0) + return 0; +error: + if (!is_journal_aborted(journal)) jbd2_journal_abort(journal, result); - else - result = jbd2_cleanup_journal_tail(journal); - - return (result < 0) ? result : 0; + return result; } /* @@ -400,7 +397,7 @@ int jbd2_cleanup_journal_tail(journal_t *journal) tid_t first_tid; unsigned long blocknr; - if (is_journal_aborted(journal)) + if (is_journal_aborted(journal) || journal->j_checkpoint_io_error) return -EIO; if (!jbd2_journal_get_log_tail(journal, &first_tid, &blocknr)) @@ -564,6 +561,7 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh) struct transaction_chp_stats_s *stats; transaction_t *transaction; journal_t *journal; + struct buffer_head *bh = jh2bh(jh); JBUFFER_TRACE(jh, "entry"); @@ -575,6 +573,14 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh) journal = transaction->t_journal; JBUFFER_TRACE(jh, "removing from transaction"); + /* + * If the buffer has been failed to write out to disk, it may probably + * lead to filesystem inconsistency after remove it from the log, so + * mark the journal checkpoint write back IO error. + */ + if (buffer_write_io_error(bh)) + journal->j_checkpoint_io_error = true; + __buffer_unlink(jh); jh->b_cp_transaction = NULL; jbd2_journal_put_journal_head(jh); diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 2dc944442802..6dbeab8b9feb 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1995,6 +1995,15 @@ int jbd2_journal_destroy(journal_t *journal) J_ASSERT(journal->j_checkpoint_transactions == NULL); spin_unlock(&journal->j_list_lock); + /* + * OK, all checkpoint transactions have been checked, now check the + * write out io error flag and abort the journal if some buffer failed + * to write back to the original location, otherwise the filesystem + * may probably becomes inconsistency. + */ + if (!is_journal_aborted(journal) && journal->j_checkpoint_io_error) + jbd2_journal_abort(journal, -EIO); + if (journal->j_sb_buffer) { if (!is_journal_aborted(journal)) { mutex_lock_io(&journal->j_checkpoint_mutex); diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 99d3cd051ac3..53ca70f80ad8 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -841,6 +841,13 @@ struct journal_s */ transaction_t *j_checkpoint_transactions; + /** + * @j_checkpoint_io_error: + * + * Detect io error while writing back original buffer to disk. + */ + bool j_checkpoint_io_error; + /** * @j_wait_transaction_locked: * From patchwork Wed Apr 14 13:47:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 12202821 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E7FDC433B4 for ; Wed, 14 Apr 2021 13:39:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 75FA3613B3 for ; Wed, 14 Apr 2021 13:39:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351448AbhDNNkL (ORCPT ); Wed, 14 Apr 2021 09:40:11 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:16918 "EHLO szxga06-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351446AbhDNNkJ (ORCPT ); Wed, 14 Apr 2021 09:40:09 -0400 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4FL3SV0VWQzjZS5; Wed, 14 Apr 2021 21:37:46 +0800 (CST) Received: from huawei.com (10.175.127.227) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.498.0; Wed, 14 Apr 2021 21:39:27 +0800 From: Zhang Yi To: CC: , , , , , Subject: [RFC PATCH v2 3/7] jbd2: don't abort the journal when freeing buffers Date: Wed, 14 Apr 2021 21:47:33 +0800 Message-ID: <20210414134737.2366971-4-yi.zhang@huawei.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20210414134737.2366971-1-yi.zhang@huawei.com> References: <20210414134737.2366971-1-yi.zhang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Now, we can make sure to abort the journal once the original buffer was failed to write back to disk, we can remove the journal abort logic in jbd2_journal_try_to_free_buffers() which was introduced in ("jbd2: abort journal if free a async write error metadata buffer"), because it may costs and propably not safe. Signed-off-by: Zhang Yi Reviewed-by: Jan Kara --- fs/jbd2/transaction.c | 17 ----------------- 1 file changed, 17 deletions(-) diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 9396666b7314..3e0db4953fe4 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -2118,7 +2118,6 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page) { struct buffer_head *head; struct buffer_head *bh; - bool has_write_io_error = false; int ret = 0; J_ASSERT(PageLocked(page)); @@ -2143,26 +2142,10 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page) jbd2_journal_put_journal_head(jh); if (buffer_jbd(bh)) goto busy; - - /* - * If we free a metadata buffer which has been failed to - * write out, the jbd2 checkpoint procedure will not detect - * this failure and may lead to filesystem inconsistency - * after cleanup journal tail. - */ - if (buffer_write_io_error(bh)) { - pr_err("JBD2: Error while async write back metadata bh %llu.", - (unsigned long long)bh->b_blocknr); - has_write_io_error = true; - } } while ((bh = bh->b_this_page) != head); ret = try_to_free_buffers(page); - busy: - if (has_write_io_error) - jbd2_journal_abort(journal, -EIO); - return ret; } From patchwork Wed Apr 14 13:47:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 12202813 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94E90C43462 for ; Wed, 14 Apr 2021 13:39:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 76C81611F2 for ; Wed, 14 Apr 2021 13:39:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351427AbhDNNkE (ORCPT ); Wed, 14 Apr 2021 09:40:04 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:16915 "EHLO szxga06-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351378AbhDNNkB (ORCPT ); Wed, 14 Apr 2021 09:40:01 -0400 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4FL3ST6jdkzjZ9Y; Wed, 14 Apr 2021 21:37:45 +0800 (CST) Received: from huawei.com (10.175.127.227) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.498.0; Wed, 14 Apr 2021 21:39:28 +0800 From: Zhang Yi To: CC: , , , , , Subject: [RFC PATCH v2 4/7] jbd2: do not free buffers in jbd2_journal_try_to_free_buffers() Date: Wed, 14 Apr 2021 21:47:34 +0800 Message-ID: <20210414134737.2366971-5-yi.zhang@huawei.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20210414134737.2366971-1-yi.zhang@huawei.com> References: <20210414134737.2366971-1-yi.zhang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch move try_to_free_buffers() out from jbd2_journal_try_to_free_buffers() to the caller function, it just check the buffers are JBD2 journal busy or not, and the caller should invoke try_to_free_buffers() if it want to release page. Signed-off-by: Zhang Yi --- fs/block_dev.c | 8 +++++--- fs/ext4/inode.c | 6 ++++-- fs/ext4/super.c | 5 +++-- fs/jbd2/transaction.c | 17 ++++++++--------- 4 files changed, 20 insertions(+), 16 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 09d6f7229db9..5ed79a9063f6 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1735,11 +1735,13 @@ EXPORT_SYMBOL_GPL(blkdev_read_iter); static int blkdev_releasepage(struct page *page, gfp_t wait) { struct super_block *super = BDEV_I(page->mapping->host)->bdev.bd_super; + int ret = 0; if (super && super->s_op->bdev_try_to_free_page) - return super->s_op->bdev_try_to_free_page(super, page, wait); - - return try_to_free_buffers(page); + ret = super->s_op->bdev_try_to_free_page(super, page, wait); + if (!ret) + return try_to_free_buffers(page); + return 0; } static int blkdev_writepages(struct address_space *mapping, diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 0948a43f1b3d..3211af9c969f 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3301,6 +3301,7 @@ static void ext4_journalled_invalidatepage(struct page *page, static int ext4_releasepage(struct page *page, gfp_t wait) { journal_t *journal = EXT4_JOURNAL(page->mapping->host); + int ret = 0; trace_ext4_releasepage(page); @@ -3308,9 +3309,10 @@ static int ext4_releasepage(struct page *page, gfp_t wait) if (PageChecked(page)) return 0; if (journal) - return jbd2_journal_try_to_free_buffers(journal, page); - else + ret = jbd2_journal_try_to_free_buffers(journal, page); + if (!ret) return try_to_free_buffers(page); + return 0; } static bool ext4_inode_datasync_dirty(struct inode *inode) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index b9693680463a..55c7a0d8d77d 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1444,7 +1444,8 @@ static int ext4_nfs_commit_metadata(struct inode *inode) * Try to release metadata pages (indirect blocks, directories) which are * mapped via the block device. Since these pages could have journal heads * which would prevent try_to_free_buffers() from freeing them, we must use - * jbd2 layer's try_to_free_buffers() function to release them. + * jbd2 layer's try_to_free_buffers() function to check and we could + * release them if it return 0. */ static int bdev_try_to_free_page(struct super_block *sb, struct page *page, gfp_t wait) @@ -1457,7 +1458,7 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page, if (journal) return jbd2_journal_try_to_free_buffers(journal, page); - return try_to_free_buffers(page); + return 0; } #ifdef CONFIG_FS_ENCRYPTION diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 3e0db4953fe4..451798051fde 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -2089,10 +2089,9 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh) * if they are fully written out ordered data, move them onto BUF_CLEAN * so try_to_free_buffers() can reap them. * - * This function returns non-zero if we wish try_to_free_buffers() - * to be called. We do this if the page is releasable by try_to_free_buffers(). - * We also do it if the page has locked or dirty buffers and the caller wants - * us to perform sync or async writeout. + * This function returns zero if all the buffers on this page are + * journal cleaned and the caller should invoke try_to_free_buffers() and + * could release page if the page is releasable by try_to_free_buffers(). * * This complicates JBD locking somewhat. We aren't protected by the * BKL here. We wish to remove the buffer from its committing or @@ -2112,7 +2111,7 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh) * cannot happen because we never reallocate freed data as metadata * while the data is part of a transaction. Yes? * - * Return 0 on failure, 1 on success + * Return 0 on success, -EBUSY if any buffer is still journal busy. */ int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page) { @@ -2140,12 +2139,12 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page) __journal_try_to_free_buffer(journal, bh); spin_unlock(&jh->b_state_lock); jbd2_journal_put_journal_head(jh); - if (buffer_jbd(bh)) - goto busy; + if (buffer_jbd(bh)) { + ret = -EBUSY; + break; + } } while ((bh = bh->b_this_page) != head); - ret = try_to_free_buffers(page); -busy: return ret; } From patchwork Wed Apr 14 13:47:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 12202817 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17B51C43461 for ; Wed, 14 Apr 2021 13:39:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EE4FA611B0 for ; Wed, 14 Apr 2021 13:39:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351443AbhDNNkG (ORCPT ); Wed, 14 Apr 2021 09:40:06 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:16919 "EHLO szxga06-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351422AbhDNNkD (ORCPT ); Wed, 14 Apr 2021 09:40:03 -0400 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4FL3SV0hfJzjZSG; Wed, 14 Apr 2021 21:37:46 +0800 (CST) Received: from huawei.com (10.175.127.227) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.498.0; Wed, 14 Apr 2021 21:39:29 +0800 From: Zhang Yi To: CC: , , , , , Subject: [RFC PATCH v2 5/7] ext4: use RCU to protect accessing superblock in blkdev_releasepage() Date: Wed, 14 Apr 2021 21:47:35 +0800 Message-ID: <20210414134737.2366971-6-yi.zhang@huawei.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20210414134737.2366971-1-yi.zhang@huawei.com> References: <20210414134737.2366971-1-yi.zhang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In blkdev_releasepage() we access the superblock structure directly, it could be raced by umount filesystem on destroy superblock in put_super(), and end up triggering a use after free issue. drop cache umount filesystem bdev_try_to_free_page() get superblock deactivate_locked_super() ... put_super() and free sb by destroy_work access superblock <-- trigger use after free This issue doesn't trigger easily in general because we get page locked when invoking bdev_try_to_free_page(), and when umount filesystem the kill_block_super()->..->kill_bdev()->truncate_inode_pages_range() procedure wait on page unlock, but it's not a guarantee. Fix this race by use RCU to protect superblock in blkdev_releasepage(). Fixes: 87d8fe1ee6b8 ("add releasepage hooks to block devices which can be used by file systems") Reported-by: Jan Kara Signed-off-by: Zhang Yi --- fs/block_dev.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 5ed79a9063f6..cb84f347fb04 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1734,11 +1734,14 @@ EXPORT_SYMBOL_GPL(blkdev_read_iter); */ static int blkdev_releasepage(struct page *page, gfp_t wait) { - struct super_block *super = BDEV_I(page->mapping->host)->bdev.bd_super; + struct super_block *super; int ret = 0; + rcu_read_lock(); + super = READ_ONCE(BDEV_I(page->mapping->host)->bdev.bd_super); if (super && super->s_op->bdev_try_to_free_page) ret = super->s_op->bdev_try_to_free_page(super, page, wait); + rcu_read_unlock(); if (!ret) return try_to_free_buffers(page); return 0; From patchwork Wed Apr 14 13:47:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 12202809 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D770DC43470 for ; Wed, 14 Apr 2021 13:39:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B808C611B0 for ; Wed, 14 Apr 2021 13:39:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351421AbhDNNkB (ORCPT ); Wed, 14 Apr 2021 09:40:01 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:16916 "EHLO szxga06-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351401AbhDNNkA (ORCPT ); Wed, 14 Apr 2021 09:40:00 -0400 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4FL3ST6xz1zjZBp; Wed, 14 Apr 2021 21:37:45 +0800 (CST) Received: from huawei.com (10.175.127.227) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.498.0; Wed, 14 Apr 2021 21:39:29 +0800 From: Zhang Yi To: CC: , , , , , Subject: [RFC PATCH v2 6/7] fs: introduce a usage count into the superblock Date: Wed, 14 Apr 2021 21:47:36 +0800 Message-ID: <20210414134737.2366971-7-yi.zhang@huawei.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20210414134737.2366971-1-yi.zhang@huawei.com> References: <20210414134737.2366971-1-yi.zhang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Commit <87d8fe1ee6b8> ("add releasepage hooks to block devices which can be used by file systems") introduce a hook that used by ext4 filesystem to release journal buffers, but it doesn't add corresponding concurrency protection that ->bdev_try_to_free_page() could be raced by umount filesystem concurrently. This patch add a usage count on superblock that filesystem can use it to prevent above race and make invoke ->bdev_try_to_free_page() safe. Signed-off-by: Zhang Yi --- include/linux/fs.h | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/include/linux/fs.h b/include/linux/fs.h index ec8f3ddf4a6a..3c6a5c08c2df 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -41,6 +41,7 @@ #include #include #include +#include #include #include @@ -1547,6 +1548,13 @@ struct super_block { spinlock_t s_inode_wblist_lock; struct list_head s_inodes_wb; /* writeback inodes */ + + /* + * Count users who are using the super_block, used to protect + * umount filesystem concurrently with others. + */ + struct percpu_ref s_usage_counter; + wait_queue_head_t s_usage_waitq; } __randomize_layout; /* Helper functions so that in most cases filesystems will @@ -1765,6 +1773,27 @@ static inline bool sb_start_intwrite_trylock(struct super_block *sb) bool inode_owner_or_capable(struct user_namespace *mnt_userns, const struct inode *inode); +static inline void sb_usage_counter_release(struct percpu_ref *ref) +{ + struct super_block *sb; + + sb = container_of(ref, struct super_block, s_usage_counter); + wake_up(&sb->s_usage_waitq); +} + +static inline int sb_usage_counter_init(struct super_block *sb) +{ + init_waitqueue_head(&sb->s_usage_waitq); + return percpu_ref_init(&sb->s_usage_counter, sb_usage_counter_release, + 0, GFP_KERNEL); +} + +static inline void sb_usage_counter_wait(struct super_block *sb) +{ + percpu_ref_kill(&sb->s_usage_counter); + wait_event(sb->s_usage_waitq, percpu_ref_is_zero(&sb->s_usage_counter)); +} + /* * VFS helper functions.. */ From patchwork Wed Apr 14 13:47:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 12202819 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3117C43611 for ; Wed, 14 Apr 2021 13:39:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BAC6A611B0 for ; Wed, 14 Apr 2021 13:39:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351438AbhDNNkG (ORCPT ); Wed, 14 Apr 2021 09:40:06 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:16914 "EHLO szxga06-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351417AbhDNNkD (ORCPT ); Wed, 14 Apr 2021 09:40:03 -0400 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4FL3SV03CtzjZRZ; Wed, 14 Apr 2021 21:37:46 +0800 (CST) Received: from huawei.com (10.175.127.227) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.498.0; Wed, 14 Apr 2021 21:39:30 +0800 From: Zhang Yi To: CC: , , , , , Subject: [RFC PATCH v2 7/7] ext4: fix race between blkdev_releasepage() and ext4_put_super() Date: Wed, 14 Apr 2021 21:47:37 +0800 Message-ID: <20210414134737.2366971-8-yi.zhang@huawei.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20210414134737.2366971-1-yi.zhang@huawei.com> References: <20210414134737.2366971-1-yi.zhang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org There still exist a use after free issue when accessing the journal structure and ext4_sb_info structure on freeing bdev buffers in bdev_try_to_free_page(). The problem is bdev_try_to_free_page() could be raced by ext4_put_super(), it dose freeing sb->s_fs_info and sbi->s_journal while release page progress are still accessing them. So it could end up trigger use-after-free or NULL pointer dereference. drop cache umount filesystem blkdev_releasepage() get sb bdev_try_to_free_page() ext4_put_super() kfree(journal) if access EXT4_SB(sb)->s_journal <-- leader to use after free sb->s_fs_info = NULL; kfree(sbi) if access EXT4_SB(sb)->s_journal <-- trigger NULL pointer dereference The above race could also happens on the error path of ext4_fill_super(). Now the bdev_try_to_free_page() is under RCU protection, this patch fix this race by use sb->s_usage_counter to make bdev_try_to_free_page() safe against concurrent kill_block_super(). Fixes: c39a7f84d784 ("ext4: provide function to release metadata pages under memory pressure") Signed-off-by: Zhang Yi --- fs/ext4/super.c | 29 +++++++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 55c7a0d8d77d..14eedd8e5bd3 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1172,6 +1172,12 @@ static void ext4_put_super(struct super_block *sb) */ ext4_unregister_sysfs(sb); + /* + * Prevent racing with bdev_try_to_free_page() access the sbi and + * journal concurrently. + */ + sb_usage_counter_wait(sb); + if (sbi->s_journal) { aborted = is_journal_aborted(sbi->s_journal); err = jbd2_journal_destroy(sbi->s_journal); @@ -1248,6 +1254,7 @@ static void ext4_put_super(struct super_block *sb) kthread_stop(sbi->s_mmp_tsk); brelse(sbi->s_sbh); sb->s_fs_info = NULL; + percpu_ref_exit(&sb->s_usage_counter); /* * Now that we are completely done shutting down the * superblock, we need to actually destroy the kobject. @@ -1450,15 +1457,22 @@ static int ext4_nfs_commit_metadata(struct inode *inode) static int bdev_try_to_free_page(struct super_block *sb, struct page *page, gfp_t wait) { - journal_t *journal = EXT4_SB(sb)->s_journal; + int ret = 0; WARN_ON(PageChecked(page)); if (!page_has_buffers(page)) return 0; - if (journal) - return jbd2_journal_try_to_free_buffers(journal, page); - return 0; + /* Racing with umount filesystem concurrently? */ + if (percpu_ref_tryget_live(&sb->s_usage_counter)) { + journal_t *journal = EXT4_SB(sb)->s_journal; + + if (journal) + ret = jbd2_journal_try_to_free_buffers(journal, page); + percpu_ref_put(&sb->s_usage_counter); + } + + return ret; } #ifdef CONFIG_FS_ENCRYPTION @@ -4709,6 +4723,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) spin_lock_init(&sbi->s_error_lock); INIT_WORK(&sbi->s_error_work, flush_stashed_error_work); + if (sb_usage_counter_init(sb)) + goto failed_mount2a; + /* Register extent status tree shrinker */ if (ext4_es_register_shrinker(sbi)) goto failed_mount3; @@ -5148,6 +5165,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) ext4_xattr_destroy_cache(sbi->s_ea_block_cache); sbi->s_ea_block_cache = NULL; + sb->s_bdev->bd_super = NULL; + sb_usage_counter_wait(sb); if (sbi->s_journal) { jbd2_journal_destroy(sbi->s_journal); sbi->s_journal = NULL; @@ -5155,6 +5174,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) failed_mount3a: ext4_es_unregister_shrinker(sbi); failed_mount3: + percpu_ref_exit(&sb->s_usage_counter); +failed_mount2a: flush_work(&sbi->s_error_work); del_timer_sync(&sbi->s_err_report); if (sbi->s_mmp_tsk)