From patchwork Mon May 15 19:22:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13242116 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68387C7EE24 for ; Mon, 15 May 2023 19:23:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244981AbjEOTXS (ORCPT ); Mon, 15 May 2023 15:23:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244961AbjEOTXR (ORCPT ); Mon, 15 May 2023 15:23:17 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8360D3A8B for ; Mon, 15 May 2023 12:23:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=vM6KmqvMDxO4n/QyDCggWhk0xl7HwZvwH7gwOxqxC2A=; b=YJgfA8vRX3t56BF56c71w0qfte VvbkkdREetLuzK2S50MFQ3uakhjvO0IxKM/lqmWAdV/HUzbbW+Tr9Trne24N8/noj32fOfSCt8wA+ 4/S7ADsGUtYL5CtVXc4omBOiiCtJM86DWGewTkzDJ1Gp5SqzK+jzq37R3p+EztTMLHO5STZfanH8j HlarRk4Yzw57AR5OqQ28ahtDg1kAb3SkQClP1ZX/nKfGCZjW6Yr35XOFSpjS0u7gThNhcSNOaGRXw yrKLJlXN7HwxBERG6BWdEUe4hLTKVpqMt82JdJHuaAdiSMIlyRucnpw6NNQA5W2p2CzRcKww5rVTW emeAeBeg==; Received: from [2001:4bb8:188:2249:ad42:acf3:6731:a041] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pydmk-003HEh-2j; Mon, 15 May 2023 19:23:08 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: linux-btrfs@vger.kernel.org Subject: [PATCH 1/6] btrfs: use a linked list for tracking per-transaction/log dirty buffers Date: Mon, 15 May 2023 21:22:51 +0200 Message-Id: <20230515192256.29006-2-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230515192256.29006-1-hch@lst.de> References: <20230515192256.29006-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently there is an extent_io_tree per transaction or tree_log root to track the dirty buffer ranges in the transaction or tree_log. This is fairly inefficient, as: - we only add to the list, until the transaction / log_tree commit happens, at which point we'll want to walk them sequentially - the dirty range means we need another mechanism to actually find the dirty buffer based off the dirty range This patch instead switches tracking to one linked list per transaction and two to each root for the two tree logs which link a new object that points directly to the buffer. Note that the list_head can't directly be embedded into the extent_buffer structure given that a buffer can be part of more than one transaction or tree_log. This also means the existing error propagation based off eb->log_index never fully worked, as this index would get overwritten once a buffer is added to a new dirty tree. The new code instead directly looks at the EXTENT_BUFFER_WRITE_ERR bit, which doesn't have this problem. Signed-off-by: Christoph Hellwig --- fs/btrfs/Makefile | 2 +- fs/btrfs/ctree.h | 3 +- fs/btrfs/disk-io.c | 72 ++++++------- fs/btrfs/extent-io-tree.h | 10 -- fs/btrfs/extent-tree.c | 21 +--- fs/btrfs/extent_buffer.c | 99 ++++++++++++++++++ fs/btrfs/extent_buffer.h | 23 +++++ fs/btrfs/extent_io.c | 59 +---------- fs/btrfs/extent_io.h | 2 - fs/btrfs/fs.h | 3 - fs/btrfs/tests/extent-io-tests.c | 2 - fs/btrfs/transaction.c | 168 +++---------------------------- fs/btrfs/transaction.h | 3 +- fs/btrfs/tree-log.c | 71 ++++++++----- fs/btrfs/zoned.c | 4 +- include/trace/events/btrfs.h | 16 +-- 16 files changed, 232 insertions(+), 326 deletions(-) create mode 100644 fs/btrfs/extent_buffer.c create mode 100644 fs/btrfs/extent_buffer.h diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 90d53209755bf8..661cead213d6df 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -24,7 +24,7 @@ obj-$(CONFIG_BTRFS_FS) := btrfs.o btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ file-item.o inode-item.o disk-io.o \ - transaction.o inode.o file.o defrag.o \ + transaction.o inode.o file.o defrag.o extent_buffer.o \ extent_map.o sysfs.o accessors.o xattr.o ordered-data.o \ extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \ export.o tree-log.o free-space-cache.o zlib.o lzo.o zstd.o \ diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 5af61480dde613..5a5fda077a0288 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -202,7 +202,8 @@ struct btrfs_root { struct btrfs_root_item root_item; struct btrfs_key root_key; struct btrfs_fs_info *fs_info; - struct extent_io_tree dirty_log_pages; + struct list_head dirty_buffers[2]; + spinlock_t dirty_buffers_lock; struct mutex objectid_mutex; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f58806b798cd65..3b81d6ec1b47b8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -24,6 +24,7 @@ #include "transaction.h" #include "btrfs_inode.h" #include "bio.h" +#include "extent_buffer.h" #include "print-tree.h" #include "locking.h" #include "tree-log.h" @@ -64,9 +65,6 @@ static void btrfs_destroy_ordered_extents(struct btrfs_root *root); static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, struct btrfs_fs_info *fs_info); static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root); -static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info, - struct extent_io_tree *dirty_pages, - int mark); static int btrfs_destroy_pinned_extent(struct btrfs_fs_info *fs_info, struct extent_io_tree *pinned_extents); static int btrfs_cleanup_transaction(struct btrfs_fs_info *fs_info); @@ -693,8 +691,9 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info, root->last_log_commit = 0; root->anon_dev = 0; if (!dummy) { - extent_io_tree_init(fs_info, &root->dirty_log_pages, - IO_TREE_ROOT_DIRTY_LOG_PAGES); + INIT_LIST_HEAD(&root->dirty_buffers[0]); + INIT_LIST_HEAD(&root->dirty_buffers[1]); + spin_lock_init(&root->dirty_buffers_lock); extent_io_tree_init(fs_info, &root->log_csum_range, IO_TREE_LOG_CSUM_RANGE); } @@ -4208,18 +4207,16 @@ static void warn_about_uncommitted_trans(struct btrfs_fs_info *fs_info) */ ASSERT(test_bit(BTRFS_FS_CLOSING_DONE, &fs_info->flags)); list_for_each_entry_safe(trans, tmp, &fs_info->trans_list, list) { - struct extent_state *cached = NULL; + struct dirty_buffer *db; u64 dirty_bytes = 0; - u64 cur = 0; - u64 found_start; - u64 found_end; found = true; - while (!find_first_extent_bit(&trans->dirty_pages, cur, - &found_start, &found_end, EXTENT_DIRTY, &cached)) { - dirty_bytes += found_end + 1 - found_start; - cur = found_end + 1; - } + + spin_lock(&trans->dirty_buffers_lock); + list_for_each_entry(db, &trans->dirty_buffers, wb_entry) + dirty_bytes += db->eb->len; + spin_unlock(&trans->dirty_buffers_lock); + btrfs_warn(fs_info, "transaction %llu (with %llu dirty metadata bytes) is not committed", trans->transid, dirty_bytes); @@ -4707,38 +4704,30 @@ static void btrfs_destroy_all_delalloc_inodes(struct btrfs_fs_info *fs_info) spin_unlock(&fs_info->delalloc_root_lock); } -static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info, - struct extent_io_tree *dirty_pages, - int mark) +static void btrfs_destroy_dirty_buffers(struct btrfs_fs_info *fs_info, + struct btrfs_transaction *cur_trans) { - int ret; - struct extent_buffer *eb; - u64 start = 0; - u64 end; + struct dirty_buffer *db; - while (1) { - ret = find_first_extent_bit(dirty_pages, start, &start, &end, - mark, NULL); - if (ret) - break; + spin_lock(&cur_trans->dirty_buffers_lock); + while ((db = list_first_entry_or_null(&cur_trans->dirty_buffers, + struct dirty_buffer, wb_entry))) { + struct extent_buffer *eb = db->eb; - clear_extent_bits(dirty_pages, start, end, mark); - while (start <= end) { - eb = find_extent_buffer(fs_info, start); - start += fs_info->nodesize; - if (!eb) - continue; + list_del_init(&db->wb_entry); + spin_unlock(&cur_trans->dirty_buffers_lock); - btrfs_tree_lock(eb); - wait_on_extent_buffer_writeback(eb); - btrfs_clear_buffer_dirty(NULL, eb); - btrfs_tree_unlock(eb); + btrfs_tree_lock(eb); + wait_on_extent_buffer_writeback(eb); + btrfs_clear_buffer_dirty(NULL, eb); + btrfs_tree_unlock(eb); - free_extent_buffer_stale(eb); - } - } + free_extent_buffer_stale(eb); + kfree(db); - return ret; + spin_lock(&cur_trans->dirty_buffers_lock); + } + spin_unlock(&cur_trans->dirty_buffers_lock); } static int btrfs_destroy_pinned_extent(struct btrfs_fs_info *fs_info, @@ -4866,8 +4855,7 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans, btrfs_destroy_delayed_inodes(fs_info); - btrfs_destroy_marked_extents(fs_info, &cur_trans->dirty_pages, - EXTENT_DIRTY); + btrfs_destroy_dirty_buffers(fs_info, cur_trans); btrfs_destroy_pinned_extent(fs_info, &cur_trans->pinned_extents); cur_trans->state =TRANS_STATE_COMPLETED; diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h index 21766e49ec02e7..0dae0a16a87cb7 100644 --- a/fs/btrfs/extent-io-tree.h +++ b/fs/btrfs/extent-io-tree.h @@ -12,13 +12,11 @@ enum { ENUM_BIT(EXTENT_DIRTY), ENUM_BIT(EXTENT_UPTODATE), ENUM_BIT(EXTENT_LOCKED), - ENUM_BIT(EXTENT_NEW), ENUM_BIT(EXTENT_DELALLOC), ENUM_BIT(EXTENT_DEFRAG), ENUM_BIT(EXTENT_BOUNDARY), ENUM_BIT(EXTENT_NODATASUM), ENUM_BIT(EXTENT_CLEAR_META_RESV), - ENUM_BIT(EXTENT_NEED_WAIT), ENUM_BIT(EXTENT_NORESERVE), ENUM_BIT(EXTENT_QGROUP_RESERVED), ENUM_BIT(EXTENT_CLEAR_DATA_RESV), @@ -68,8 +66,6 @@ enum { IO_TREE_BTREE_INODE_IO, IO_TREE_INODE_IO, IO_TREE_RELOC_BLOCKS, - IO_TREE_TRANS_DIRTY_PAGES, - IO_TREE_ROOT_DIRTY_LOG_PAGES, IO_TREE_INODE_FILE_EXTENT, IO_TREE_LOG_CSUM_RANGE, IO_TREE_SELFTEST, @@ -210,12 +206,6 @@ static inline int set_extent_defrag(struct extent_io_tree *tree, u64 start, cached_state, GFP_NOFS); } -static inline int set_extent_new(struct extent_io_tree *tree, u64 start, - u64 end) -{ - return set_extent_bit(tree, start, end, EXTENT_NEW, NULL, GFP_NOFS); -} - int find_first_extent_bit(struct extent_io_tree *tree, u64 start, u64 *start_ret, u64 *end_ret, u32 bits, struct extent_state **cached_state); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 99b9ab5ccf932a..20d7cb51de944f 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -17,6 +17,7 @@ #include #include #include "ctree.h" +#include "extent_buffer.h" #include "extent-tree.h" #include "tree-log.h" #include "disk-io.h" @@ -4822,23 +4823,9 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root, btrfs_set_header_owner(buf, owner); write_extent_buffer_fsid(buf, fs_info->fs_devices->metadata_uuid); write_extent_buffer_chunk_tree_uuid(buf, fs_info->chunk_tree_uuid); - if (root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID) { - buf->log_index = root->log_transid % 2; - /* - * we allow two log transactions at a time, use different - * EXTENT bit to differentiate dirty pages. - */ - if (buf->log_index == 0) - set_extent_dirty(&root->dirty_log_pages, buf->start, - buf->start + buf->len - 1, GFP_NOFS); - else - set_extent_new(&root->dirty_log_pages, buf->start, - buf->start + buf->len - 1); - } else { - buf->log_index = -1; - set_extent_dirty(&trans->transaction->dirty_pages, buf->start, - buf->start + buf->len - 1, GFP_NOFS); - } + + btrfs_add_to_dirty_buffers_list(buf, trans->transaction, root); + /* this returns a buffer locked for blocking */ return buf; } diff --git a/fs/btrfs/extent_buffer.c b/fs/btrfs/extent_buffer.c new file mode 100644 index 00000000000000..d20fc0d113968f --- /dev/null +++ b/fs/btrfs/extent_buffer.c @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023 Christoph Hellwig. + */ + +#include +#include "extent_buffer.h" +#include "extent_io.h" +#include "fs.h" +#include "transaction.h" + +void btrfs_add_to_dirty_buffers_list(struct extent_buffer *eb, + struct btrfs_transaction *trans, + struct btrfs_root *root) +{ + struct dirty_buffer *db = kmalloc(sizeof(*db), GFP_NOFS | __GFP_NOFAIL); + + db->eb = eb; + atomic_inc(&eb->refs); /* for the dirty list */ + if (root && root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID) { + spin_lock(&root->dirty_buffers_lock); + list_add_tail(&db->wb_entry, + &root->dirty_buffers[root->log_transid % 2]); + spin_unlock(&root->dirty_buffers_lock); + } else { + spin_lock(&trans->dirty_buffers_lock); + list_add_tail(&db->wb_entry, &trans->dirty_buffers); + spin_unlock(&trans->dirty_buffers_lock); + } +} + +static int extent_buffer_cmp(void *priv, const struct list_head *a, + const struct list_head *b) +{ + s64 diff = container_of(a, struct dirty_buffer, wb_entry)->eb->start - + container_of(b, struct dirty_buffer, wb_entry)->eb->start; + + if (diff < 0) + return -1; + if (diff > 0) + return 1; + return 0; +} + +int btrfs_write_buffers(struct btrfs_fs_info *fs_info, struct list_head *list) +{ + struct address_space *mapping = fs_info->btree_inode->i_mapping; + struct dirty_buffer *db; + int werr = 0, err; + + list_sort(NULL, list, extent_buffer_cmp); + + list_for_each_entry(db, list, wb_entry) { + err = filemap_fdatawrite_range(mapping, db->eb->start, + db->eb->start + db->eb->len - 1); + if (err) + werr = err; + cond_resched(); + } + + return werr; +} + +int btrfs_wait_buffers(struct btrfs_fs_info *fs_info, struct list_head *list) +{ + struct address_space *mapping = fs_info->btree_inode->i_mapping; + struct dirty_buffer *db; + int werr = 0, err; + + while ((db = list_first_entry_or_null(list, struct dirty_buffer, + wb_entry))) { + list_del_init(&db->wb_entry); + + err = filemap_fdatawait_range(mapping, db->eb->start, + db->eb->start + db->eb->len - 1); + if (err) + werr = err; + if (!werr && test_bit(EXTENT_BUFFER_WRITE_ERR, &db->eb->bflags)) + werr = -EIO; + free_extent_buffer(db->eb); + kfree(db); + + cond_resched(); + } + + return werr; +} + +void btrfs_release_buffers(struct list_head *list) +{ + struct dirty_buffer *db; + + while ((db = list_first_entry_or_null(list, struct dirty_buffer, + wb_entry))) { + list_del_init(&db->wb_entry); + free_extent_buffer(db->eb); + kfree(db); + } +} diff --git a/fs/btrfs/extent_buffer.h b/fs/btrfs/extent_buffer.h new file mode 100644 index 00000000000000..880b325d7b411a --- /dev/null +++ b/fs/btrfs/extent_buffer.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef BTRFS_EXTENT_BUFFER_H +#define BTRFS_EXTENT_BUFFER_H + +struct extent_buffer; +struct btrfs_fs_info; +struct btrfs_root; +struct btrfs_transaction; +struct list_head; + +struct dirty_buffer { + struct list_head wb_entry; + struct extent_buffer *eb; +}; + +void btrfs_add_to_dirty_buffers_list(struct extent_buffer *eb, + struct btrfs_transaction *trans, + struct btrfs_root *root); +int btrfs_write_buffers(struct btrfs_fs_info *fs_info, struct list_head *list); +int btrfs_wait_buffers(struct btrfs_fs_info *fs_info, struct list_head *list); +void btrfs_release_buffers(struct list_head *list); + +#endif /* BTRFS_EXTENT_BUFFER_H */ diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 5999ac3ee601db..d5937ed0962d38 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1659,8 +1659,11 @@ static bool lock_extent_buffer_for_io(struct extent_buffer *eb, static void set_btree_ioerr(struct extent_buffer *eb) { - struct btrfs_fs_info *fs_info = eb->fs_info; - + /* + * Propagate this error to btrfs_wait_buffers in case the buffer got + * written out by optimistic VM flushing and not the explicit writeback + * from btrfs_write_buffers(). + */ set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags); /* @@ -1676,58 +1679,6 @@ static void set_btree_ioerr(struct extent_buffer *eb) * the superblock. */ mapping_set_error(eb->fs_info->btree_inode->i_mapping, -EIO); - - /* - * If writeback for a btree extent that doesn't belong to a log tree - * failed, increment the counter transaction->eb_write_errors. - * We do this because while the transaction is running and before it's - * committing (when we call filemap_fdata[write|wait]_range against - * the btree inode), we might have - * btree_inode->i_mapping->a_ops->writepages() called by the VM - if it - * returns an error or an error happens during writeback, when we're - * committing the transaction we wouldn't know about it, since the pages - * can be no longer dirty nor marked anymore for writeback (if a - * subsequent modification to the extent buffer didn't happen before the - * transaction commit), which makes filemap_fdata[write|wait]_range not - * able to find the pages tagged with SetPageError at transaction - * commit time. So if this happens we must abort the transaction, - * otherwise we commit a super block with btree roots that point to - * btree nodes/leafs whose content on disk is invalid - either garbage - * or the content of some node/leaf from a past generation that got - * cowed or deleted and is no longer valid. - * - * Note: setting AS_EIO/AS_ENOSPC in the btree inode's i_mapping would - * not be enough - we need to distinguish between log tree extents vs - * non-log tree extents, and the next filemap_fdatawait_range() call - * will catch and clear such errors in the mapping - and that call might - * be from a log sync and not from a transaction commit. Also, checking - * for the eb flag EXTENT_BUFFER_WRITE_ERR at transaction commit time is - * not done and would not be reliable - the eb might have been released - * from memory and reading it back again means that flag would not be - * set (since it's a runtime flag, not persisted on disk). - * - * Using the flags below in the btree inode also makes us achieve the - * goal of AS_EIO/AS_ENOSPC when writepages() returns success, started - * writeback for all dirty pages and before filemap_fdatawait_range() - * is called, the writeback for all dirty pages had already finished - * with errors - because we were not using AS_EIO/AS_ENOSPC, - * filemap_fdatawait_range() would return success, as it could not know - * that writeback errors happened (the pages were no longer tagged for - * writeback). - */ - switch (eb->log_index) { - case -1: - set_bit(BTRFS_FS_BTREE_ERR, &fs_info->flags); - break; - case 0: - set_bit(BTRFS_FS_LOG1_ERR, &fs_info->flags); - break; - case 1: - set_bit(BTRFS_FS_LOG2_ERR, &fs_info->flags); - break; - default: - BUG(); /* unexpected, logic error */ - } } /* diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index db9148bafd02c3..7c0e638346017e 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -83,8 +83,6 @@ struct extent_buffer { int read_mirror; struct rcu_head rcu_head; pid_t lock_owner; - /* >= 0 if eb belongs to a log tree, -1 otherwise */ - s8 log_index; struct rw_semaphore lock; diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h index 840e4def18b519..f71281408102c3 100644 --- a/fs/btrfs/fs.h +++ b/fs/btrfs/fs.h @@ -81,9 +81,6 @@ enum { BTRFS_FS_QUOTA_ENABLED, BTRFS_FS_UPDATE_UUID_TREE_GEN, BTRFS_FS_CREATING_FREE_SPACE_TREE, - BTRFS_FS_BTREE_ERR, - BTRFS_FS_LOG1_ERR, - BTRFS_FS_LOG2_ERR, BTRFS_FS_QUOTA_OVERRIDE, /* Used to record internally whether fs has been frozen */ BTRFS_FS_FROZEN, diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c index dfc5c7fa603899..df7f526e0718e0 100644 --- a/fs/btrfs/tests/extent-io-tests.c +++ b/fs/btrfs/tests/extent-io-tests.c @@ -75,13 +75,11 @@ static void extent_flag_to_str(const struct extent_state *state, char *dest) PRINT_ONE_FLAG(state, dest, cur, DIRTY); PRINT_ONE_FLAG(state, dest, cur, UPTODATE); PRINT_ONE_FLAG(state, dest, cur, LOCKED); - PRINT_ONE_FLAG(state, dest, cur, NEW); PRINT_ONE_FLAG(state, dest, cur, DELALLOC); PRINT_ONE_FLAG(state, dest, cur, DEFRAG); PRINT_ONE_FLAG(state, dest, cur, BOUNDARY); PRINT_ONE_FLAG(state, dest, cur, NODATASUM); PRINT_ONE_FLAG(state, dest, cur, CLEAR_META_RESV); - PRINT_ONE_FLAG(state, dest, cur, NEED_WAIT); PRINT_ONE_FLAG(state, dest, cur, NORESERVE); PRINT_ONE_FLAG(state, dest, cur, QGROUP_RESERVED); PRINT_ONE_FLAG(state, dest, cur, CLEAR_DATA_RESV); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index fe0f00e717a834..5353bfd8309425 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -3,6 +3,7 @@ * Copyright (C) 2007 Oracle. All rights reserved. */ +#include #include #include #include @@ -12,6 +13,7 @@ #include #include #include +#include "extent_buffer.h" #include "misc.h" #include "ctree.h" #include "disk-io.h" @@ -191,7 +193,8 @@ static noinline void switch_commit_roots(struct btrfs_trans_handle *trans) list_del_init(&root->dirty_list); free_extent_buffer(root->commit_root); root->commit_root = btrfs_root_node(root); - extent_io_tree_release(&root->dirty_log_pages); + ASSERT(list_empty_careful(&root->dirty_buffers[0])); + ASSERT(list_empty_careful(&root->dirty_buffers[1])); btrfs_qgroup_clean_swapped_blocks(root); } @@ -375,8 +378,8 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, INIT_LIST_HEAD(&cur_trans->deleted_bgs); spin_lock_init(&cur_trans->dropped_roots_lock); list_add_tail(&cur_trans->list, &fs_info->trans_list); - extent_io_tree_init(fs_info, &cur_trans->dirty_pages, - IO_TREE_TRANS_DIRTY_PAGES); + INIT_LIST_HEAD(&cur_trans->dirty_buffers); + spin_lock_init(&cur_trans->dirty_buffers_lock); extent_io_tree_init(fs_info, &cur_trans->pinned_extents, IO_TREE_FS_PINNED_EXTENTS); fs_info->generation++; @@ -1039,141 +1042,6 @@ int btrfs_end_transaction_throttle(struct btrfs_trans_handle *trans) return __btrfs_end_transaction(trans, 1); } -/* - * when btree blocks are allocated, they have some corresponding bits set for - * them in one of two extent_io trees. This is used to make sure all of - * those extents are sent to disk but does not wait on them - */ -int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info, - struct extent_io_tree *dirty_pages, int mark) -{ - int err = 0; - int werr = 0; - struct address_space *mapping = fs_info->btree_inode->i_mapping; - struct extent_state *cached_state = NULL; - u64 start = 0; - u64 end; - - while (!find_first_extent_bit(dirty_pages, start, &start, &end, - mark, &cached_state)) { - bool wait_writeback = false; - - err = convert_extent_bit(dirty_pages, start, end, - EXTENT_NEED_WAIT, - mark, &cached_state); - /* - * convert_extent_bit can return -ENOMEM, which is most of the - * time a temporary error. So when it happens, ignore the error - * and wait for writeback of this range to finish - because we - * failed to set the bit EXTENT_NEED_WAIT for the range, a call - * to __btrfs_wait_marked_extents() would not know that - * writeback for this range started and therefore wouldn't - * wait for it to finish - we don't want to commit a - * superblock that points to btree nodes/leafs for which - * writeback hasn't finished yet (and without errors). - * We cleanup any entries left in the io tree when committing - * the transaction (through extent_io_tree_release()). - */ - if (err == -ENOMEM) { - err = 0; - wait_writeback = true; - } - if (!err) - err = filemap_fdatawrite_range(mapping, start, end); - if (err) - werr = err; - else if (wait_writeback) - werr = filemap_fdatawait_range(mapping, start, end); - free_extent_state(cached_state); - cached_state = NULL; - cond_resched(); - start = end + 1; - } - return werr; -} - -/* - * when btree blocks are allocated, they have some corresponding bits set for - * them in one of two extent_io trees. This is used to make sure all of - * those extents are on disk for transaction or log commit. We wait - * on all the pages and clear them from the dirty pages state tree - */ -static int __btrfs_wait_marked_extents(struct btrfs_fs_info *fs_info, - struct extent_io_tree *dirty_pages) -{ - int err = 0; - int werr = 0; - struct address_space *mapping = fs_info->btree_inode->i_mapping; - struct extent_state *cached_state = NULL; - u64 start = 0; - u64 end; - - while (!find_first_extent_bit(dirty_pages, start, &start, &end, - EXTENT_NEED_WAIT, &cached_state)) { - /* - * Ignore -ENOMEM errors returned by clear_extent_bit(). - * When committing the transaction, we'll remove any entries - * left in the io tree. For a log commit, we don't remove them - * after committing the log because the tree can be accessed - * concurrently - we do it only at transaction commit time when - * it's safe to do it (through extent_io_tree_release()). - */ - err = clear_extent_bit(dirty_pages, start, end, - EXTENT_NEED_WAIT, &cached_state); - if (err == -ENOMEM) - err = 0; - if (!err) - err = filemap_fdatawait_range(mapping, start, end); - if (err) - werr = err; - free_extent_state(cached_state); - cached_state = NULL; - cond_resched(); - start = end + 1; - } - if (err) - werr = err; - return werr; -} - -static int btrfs_wait_extents(struct btrfs_fs_info *fs_info, - struct extent_io_tree *dirty_pages) -{ - bool errors = false; - int err; - - err = __btrfs_wait_marked_extents(fs_info, dirty_pages); - if (test_and_clear_bit(BTRFS_FS_BTREE_ERR, &fs_info->flags)) - errors = true; - - if (errors && !err) - err = -EIO; - return err; -} - -int btrfs_wait_tree_log_extents(struct btrfs_root *log_root, int mark) -{ - struct btrfs_fs_info *fs_info = log_root->fs_info; - struct extent_io_tree *dirty_pages = &log_root->dirty_log_pages; - bool errors = false; - int err; - - ASSERT(log_root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID); - - err = __btrfs_wait_marked_extents(fs_info, dirty_pages); - if ((mark & EXTENT_DIRTY) && - test_and_clear_bit(BTRFS_FS_LOG1_ERR, &fs_info->flags)) - errors = true; - - if ((mark & EXTENT_NEW) && - test_and_clear_bit(BTRFS_FS_LOG2_ERR, &fs_info->flags)) - errors = true; - - if (errors && !err) - err = -EIO; - return err; -} - /* * When btree blocks are allocated the corresponding extents are marked dirty. * This function ensures such extents are persisted on disk for transaction or @@ -1183,25 +1051,22 @@ int btrfs_wait_tree_log_extents(struct btrfs_root *log_root, int mark) */ static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans) { - int ret; - int ret2; - struct extent_io_tree *dirty_pages = &trans->transaction->dirty_pages; - struct btrfs_fs_info *fs_info = trans->fs_info; + LIST_HEAD(dirty_buffers); struct blk_plug plug; + int ret, ret2; + + spin_lock(&trans->transaction->dirty_buffers_lock); + list_splice_init(&trans->transaction->dirty_buffers, &dirty_buffers); + spin_unlock(&trans->transaction->dirty_buffers_lock); blk_start_plug(&plug); - ret = btrfs_write_marked_extents(fs_info, dirty_pages, EXTENT_DIRTY); + ret = btrfs_write_buffers(trans->fs_info, &dirty_buffers); blk_finish_plug(&plug); - ret2 = btrfs_wait_extents(fs_info, dirty_pages); - - extent_io_tree_release(&trans->transaction->dirty_pages); + ret2 = btrfs_wait_buffers(trans->fs_info, &dirty_buffers); if (ret) return ret; - else if (ret2) - return ret2; - else - return 0; + return ret2; } /* @@ -2443,9 +2308,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) btrfs_commit_device_sizes(cur_trans); - clear_bit(BTRFS_FS_LOG1_ERR, &fs_info->flags); - clear_bit(BTRFS_FS_LOG2_ERR, &fs_info->flags); - btrfs_trans_release_chunk_metadata(trans); /* diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 8e9fa23bd7fed7..eade7eb41a07ef 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -47,7 +47,8 @@ struct btrfs_transaction { enum btrfs_trans_state state; int aborted; struct list_head list; - struct extent_io_tree dirty_pages; + struct list_head dirty_buffers; + spinlock_t dirty_buffers_lock; time64_t start_time; wait_queue_head_t writer_wait; wait_queue_head_t commit_wait; diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 9b212e8c70cc58..012b7e03d57968 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -8,6 +8,7 @@ #include #include #include +#include "extent_buffer.h" #include "misc.h" #include "ctree.h" #include "tree-log.h" @@ -2847,6 +2848,24 @@ static inline void btrfs_remove_all_log_ctxs(struct btrfs_root *root, } } +static void btrfs_pick_all_dirty_log_buffers(struct btrfs_root *log, + struct list_head *dirty_buffers) +{ + spin_lock(&log->dirty_buffers_lock); + list_splice_tail_init(&log->dirty_buffers[0], dirty_buffers); + list_splice_tail_init(&log->dirty_buffers[1], dirty_buffers); + spin_unlock(&log->dirty_buffers_lock); +} + +static void btrfs_pick_dirty_log_buffers(struct btrfs_root *log, + struct list_head *dirty_buffers, + int log_transid) +{ + spin_lock(&log->dirty_buffers_lock); + list_splice_init(&log->dirty_buffers[log_transid % 2], dirty_buffers); + spin_unlock(&log->dirty_buffers_lock); +} + /* * btrfs_sync_log does sends a given tree log down to the disk and * updates the super blocks to record it. When this call is done, @@ -2864,7 +2883,6 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, { int index1; int index2; - int mark; int ret; struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *log = root->log_root; @@ -2875,6 +2893,8 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, struct blk_plug plug; u64 log_root_start; u64 log_root_level; + LIST_HEAD(dirty_log_buffers); + LIST_HEAD(dirty_log_root_buffers); mutex_lock(&root->log_mutex); log_transid = ctx->log_transid; @@ -2917,16 +2937,15 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, goto out; } - if (log_transid % 2 == 0) - mark = EXTENT_DIRTY; - else - mark = EXTENT_NEW; + blk_start_plug(&plug); - /* we start IO on all the marked extents here, but we don't actually - * wait for them until later. + /* + * Start I/O on all dirty buffers, but don't actually wait for them + * until later. */ - blk_start_plug(&plug); - ret = btrfs_write_marked_extents(fs_info, &log->dirty_log_pages, mark); + btrfs_pick_dirty_log_buffers(log, &dirty_log_buffers, log_transid); + ret = btrfs_write_buffers(fs_info, &dirty_log_buffers); + /* * -EAGAIN happens when someone, e.g., a concurrent transaction * commit, writes a dirty extent in this tree-log commit. This @@ -3008,7 +3027,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, btrfs_err(fs_info, "failed to update log for root %llu ret %d", root->root_key.objectid, ret); - btrfs_wait_tree_log_extents(log, mark); + btrfs_wait_buffers(fs_info, &dirty_log_buffers); mutex_unlock(&log_root_tree->log_mutex); goto out; } @@ -3024,7 +3043,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, index2 = root_log_ctx.log_transid % 2; if (atomic_read(&log_root_tree->log_commit[index2])) { blk_finish_plug(&plug); - ret = btrfs_wait_tree_log_extents(log, mark); + ret = btrfs_wait_buffers(fs_info, &dirty_log_buffers); wait_log_commit(log_root_tree, root_log_ctx.log_transid); mutex_unlock(&log_root_tree->log_mutex); @@ -3046,15 +3065,14 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, */ if (btrfs_need_log_full_commit(trans)) { blk_finish_plug(&plug); - btrfs_wait_tree_log_extents(log, mark); + btrfs_wait_buffers(fs_info, &dirty_log_buffers); mutex_unlock(&log_root_tree->log_mutex); ret = BTRFS_LOG_FORCE_COMMIT; goto out_wake_log_root; } - ret = btrfs_write_marked_extents(fs_info, - &log_root_tree->dirty_log_pages, - EXTENT_DIRTY | EXTENT_NEW); + btrfs_pick_all_dirty_log_buffers(log_root_tree, &dirty_log_root_buffers); + ret = btrfs_write_buffers(fs_info, &dirty_log_root_buffers); blk_finish_plug(&plug); /* * As described above, -EAGAIN indicates a hole in the extents. We @@ -3063,7 +3081,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, */ if (ret == -EAGAIN && btrfs_is_zoned(fs_info)) { btrfs_set_log_full_commit(trans); - btrfs_wait_tree_log_extents(log, mark); + btrfs_wait_buffers(fs_info, &dirty_log_buffers); mutex_unlock(&log_root_tree->log_mutex); goto out_wake_log_root; } else if (ret) { @@ -3071,10 +3089,9 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, mutex_unlock(&log_root_tree->log_mutex); goto out_wake_log_root; } - ret = btrfs_wait_tree_log_extents(log, mark); + ret = btrfs_wait_buffers(fs_info, &dirty_log_buffers); if (!ret) - ret = btrfs_wait_tree_log_extents(log_root_tree, - EXTENT_NEW | EXTENT_DIRTY); + ret = btrfs_wait_buffers(fs_info, &dirty_log_root_buffers); if (ret) { btrfs_set_log_full_commit(trans); mutex_unlock(&log_root_tree->log_mutex); @@ -3160,6 +3177,9 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, atomic_set(&root->log_commit[index1], 0); mutex_unlock(&root->log_mutex); + btrfs_release_buffers(&dirty_log_buffers); + btrfs_release_buffers(&dirty_log_root_buffers); + /* * The barrier before waitqueue_active (in cond_wake_up) is needed so * all the updates above are seen by the woken threads. It might not be @@ -3177,6 +3197,7 @@ static void free_log_tree(struct btrfs_trans_handle *trans, .free = 1, .process_func = process_one_buffer }; + LIST_HEAD(dirty_buffers); if (log->node) { ret = walk_log_tree(trans, log, &wc); @@ -3198,11 +3219,9 @@ static void free_log_tree(struct btrfs_trans_handle *trans, * them out and wait for their writeback to complete, so * that we properly cleanup their state and pages. */ - btrfs_write_marked_extents(log->fs_info, - &log->dirty_log_pages, - EXTENT_DIRTY | EXTENT_NEW); - btrfs_wait_tree_log_extents(log, - EXTENT_DIRTY | EXTENT_NEW); + btrfs_pick_all_dirty_log_buffers(log, &dirty_buffers); + btrfs_write_buffers(log->fs_info, &dirty_buffers); + btrfs_wait_buffers(log->fs_info, &dirty_buffers); if (trans) btrfs_abort_transaction(trans, ret); @@ -3211,8 +3230,8 @@ static void free_log_tree(struct btrfs_trans_handle *trans, } } - clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1, - EXTENT_DIRTY | EXTENT_NEW | EXTENT_NEED_WAIT); + btrfs_pick_all_dirty_log_buffers(log, &dirty_buffers); + btrfs_release_buffers(&dirty_buffers); extent_io_tree_release(&log->log_csum_range); btrfs_put_root(log); diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index bda301a55cbe3b..e4b8134ab70166 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -6,6 +6,7 @@ #include #include #include +#include "extent_buffer.h" #include "ctree.h" #include "volumes.h" #include "zoned.h" @@ -1611,8 +1612,7 @@ void btrfs_redirty_list_add(struct btrfs_transaction *trans, memzero_extent_buffer(eb, 0, eb->len); set_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags); set_extent_buffer_dirty(eb); - set_extent_bits_nowait(&trans->dirty_pages, eb->start, - eb->start + eb->len - 1, EXTENT_DIRTY); + btrfs_add_to_dirty_buffers_list(eb, trans, NULL); } bool btrfs_use_zone_append(struct btrfs_bio *bbio) diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index 8ea9cea9bfeb4d..aae33b9067b918 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -86,8 +86,6 @@ struct find_free_extent_ctl; EM( IO_TREE_BTREE_INODE_IO, "BTREE_INODE_IO") \ EM( IO_TREE_INODE_IO, "INODE_IO") \ EM( IO_TREE_RELOC_BLOCKS, "RELOC_BLOCKS") \ - EM( IO_TREE_TRANS_DIRTY_PAGES, "TRANS_DIRTY_PAGES") \ - EM( IO_TREE_ROOT_DIRTY_LOG_PAGES, "ROOT_DIRTY_LOG_PAGES") \ EM( IO_TREE_INODE_FILE_EXTENT, "INODE_FILE_EXTENT") \ EM( IO_TREE_LOG_CSUM_RANGE, "LOG_CSUM_RANGE") \ EMe(IO_TREE_SELFTEST, "SELFTEST") @@ -147,13 +145,11 @@ FLUSH_STATES { EXTENT_DIRTY, "DIRTY"}, \ { EXTENT_UPTODATE, "UPTODATE"}, \ { EXTENT_LOCKED, "LOCKED"}, \ - { EXTENT_NEW, "NEW"}, \ { EXTENT_DELALLOC, "DELALLOC"}, \ { EXTENT_DEFRAG, "DEFRAG"}, \ { EXTENT_BOUNDARY, "BOUNDARY"}, \ { EXTENT_NODATASUM, "NODATASUM"}, \ { EXTENT_CLEAR_META_RESV, "CLEAR_META_RESV"}, \ - { EXTENT_NEED_WAIT, "NEED_WAIT"}, \ { EXTENT_NORESERVE, "NORESERVE"}, \ { EXTENT_QGROUP_RESERVED, "QGROUP_RESERVED"}, \ { EXTENT_CLEAR_DATA_RESV, "CLEAR_DATA_RESV"}, \ @@ -2289,7 +2285,6 @@ DECLARE_EVENT_CLASS(btrfs_sleep_tree_lock, __field( u64, end_ns ) __field( u64, diff_ns ) __field( u64, owner ) - __field( int, is_log_tree ) ), TP_fast_assign_btrfs(eb->fs_info, @@ -2299,14 +2294,13 @@ DECLARE_EVENT_CLASS(btrfs_sleep_tree_lock, __entry->end_ns = ktime_get_ns(); __entry->diff_ns = __entry->end_ns - start_ns; __entry->owner = btrfs_header_owner(eb); - __entry->is_log_tree = (eb->log_index >= 0); ), TP_printk_btrfs( -"block=%llu generation=%llu start_ns=%llu end_ns=%llu diff_ns=%llu owner=%llu is_log_tree=%d", +"block=%llu generation=%llu start_ns=%llu end_ns=%llu diff_ns=%llu owner=%llu", __entry->block, __entry->generation, __entry->start_ns, __entry->end_ns, __entry->diff_ns, - __entry->owner, __entry->is_log_tree) + __entry->owner) ); DEFINE_EVENT(btrfs_sleep_tree_lock, btrfs_tree_read_lock, @@ -2330,19 +2324,17 @@ DECLARE_EVENT_CLASS(btrfs_locking_events, __field( u64, block ) __field( u64, generation ) __field( u64, owner ) - __field( int, is_log_tree ) ), TP_fast_assign_btrfs(eb->fs_info, __entry->block = eb->start; __entry->generation = btrfs_header_generation(eb); __entry->owner = btrfs_header_owner(eb); - __entry->is_log_tree = (eb->log_index >= 0); ), - TP_printk_btrfs("block=%llu generation=%llu owner=%llu is_log_tree=%d", + TP_printk_btrfs("block=%llu generation=%llu owner=%llu", __entry->block, __entry->generation, - __entry->owner, __entry->is_log_tree) + __entry->owner) ); #define DEFINE_BTRFS_LOCK_EVENT(name) \ From patchwork Mon May 15 19:22:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13242117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88422C77B7D for ; Mon, 15 May 2023 19:23:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245034AbjEOTX3 (ORCPT ); Mon, 15 May 2023 15:23:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244979AbjEOTXV (ORCPT ); Mon, 15 May 2023 15:23:21 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B378BD2EE for ; Mon, 15 May 2023 12:23:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=rUVHQaxtQHKTTg38c9BwT9DVkb0g571xhIzdRFcekUA=; b=ET8TKu2HSIFmCtQUyF+aTEf85W 9QFtLRmYYQeiQNDKOhGxYWF+hvnyEmz4hH2fnoaO5N20Tg+5L9ZDbcTocSonAqOB0NxRN7EalqQwo yjWce82bzLmbrY1U29QQTULt8utoT3j0b/CqAi0PRY/mt6+aRVnMrctZRUVbSEcb8F1mQHvdrJtk5 8TmvptLEryjC6whLHE6BSKBr4au2ilYcAnHEzM4VA6WBCqiB8gSVgD+BHgw4S5Q/cJ2wt+IL1PpGZ rgYRGEEV1hYK78iq6OQkr9kJKOFMLCPseC7afy0HCpiMHDc4DcHjhoKvt3lwhXAoWr5NZkx7TSX8x BVfxmU7g==; Received: from [2001:4bb8:188:2249:ad42:acf3:6731:a041] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pydmr-003HFT-31; Mon, 15 May 2023 19:23:14 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: linux-btrfs@vger.kernel.org Subject: [PATCH 2/6] btrfs: remove convert_extent_bit Date: Mon, 15 May 2023 21:22:52 +0200 Message-Id: <20230515192256.29006-3-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230515192256.29006-1-hch@lst.de> References: <20230515192256.29006-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The last user of this function just went away, so remove it. Signed-off-by: Christoph Hellwig --- fs/btrfs/extent-io-tree.c | 212 ----------------------------------- fs/btrfs/extent-io-tree.h | 4 - include/trace/events/btrfs.h | 41 ------- 3 files changed, 257 deletions(-) diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c index 29a225836e286e..bfda8e1debee1e 100644 --- a/fs/btrfs/extent-io-tree.c +++ b/fs/btrfs/extent-io-tree.c @@ -1194,218 +1194,6 @@ int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, cached_state, NULL, mask); } -/* - * Convert all bits in a given range from one bit to another - * - * @tree: the io tree to search - * @start: the start offset in bytes - * @end: the end offset in bytes (inclusive) - * @bits: the bits to set in this range - * @clear_bits: the bits to clear in this range - * @cached_state: state that we're going to cache - * - * This will go through and set bits for the given range. If any states exist - * already in this range they are set with the given bit and cleared of the - * clear_bits. This is only meant to be used by things that are mergeable, ie. - * converting from say DELALLOC to DIRTY. This is not meant to be used with - * boundary bits like LOCK. - * - * All allocations are done with GFP_NOFS. - */ -int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - u32 bits, u32 clear_bits, - struct extent_state **cached_state) -{ - struct extent_state *state; - struct extent_state *prealloc = NULL; - struct rb_node **p = NULL; - struct rb_node *parent = NULL; - int err = 0; - u64 last_start; - u64 last_end; - bool first_iteration = true; - - btrfs_debug_check_extent_io_range(tree, start, end); - trace_btrfs_convert_extent_bit(tree, start, end - start + 1, bits, - clear_bits); - -again: - if (!prealloc) { - /* - * Best effort, don't worry if extent state allocation fails - * here for the first iteration. We might have a cached state - * that matches exactly the target range, in which case no - * extent state allocations are needed. We'll only know this - * after locking the tree. - */ - prealloc = alloc_extent_state(GFP_NOFS); - if (!prealloc && !first_iteration) - return -ENOMEM; - } - - spin_lock(&tree->lock); - if (cached_state && *cached_state) { - state = *cached_state; - if (state->start <= start && state->end > start && - extent_state_in_tree(state)) - goto hit_next; - } - - /* - * This search will find all the extents that end after our range - * starts. - */ - state = tree_search_for_insert(tree, start, &p, &parent); - if (!state) { - prealloc = alloc_extent_state_atomic(prealloc); - if (!prealloc) { - err = -ENOMEM; - goto out; - } - prealloc->start = start; - prealloc->end = end; - insert_state_fast(tree, prealloc, p, parent, bits, NULL); - cache_state(prealloc, cached_state); - prealloc = NULL; - goto out; - } -hit_next: - last_start = state->start; - last_end = state->end; - - /* - * | ---- desired range ---- | - * | state | - * - * Just lock what we found and keep going. - */ - if (state->start == start && state->end <= end) { - set_state_bits(tree, state, bits, NULL); - cache_state(state, cached_state); - state = clear_state_bit(tree, state, clear_bits, 0, NULL); - if (last_end == (u64)-1) - goto out; - start = last_end + 1; - if (start < end && state && state->start == start && - !need_resched()) - goto hit_next; - goto search_again; - } - - /* - * | ---- desired range ---- | - * | state | - * or - * | ------------- state -------------- | - * - * We need to split the extent we found, and may flip bits on second - * half. - * - * If the extent we found extends past our range, we just split and - * search again. It'll get split again the next time though. - * - * If the extent we found is inside our range, we set the desired bit - * on it. - */ - if (state->start < start) { - prealloc = alloc_extent_state_atomic(prealloc); - if (!prealloc) { - err = -ENOMEM; - goto out; - } - err = split_state(tree, state, prealloc, start); - if (err) - extent_io_tree_panic(tree, err); - prealloc = NULL; - if (err) - goto out; - if (state->end <= end) { - set_state_bits(tree, state, bits, NULL); - cache_state(state, cached_state); - state = clear_state_bit(tree, state, clear_bits, 0, NULL); - if (last_end == (u64)-1) - goto out; - start = last_end + 1; - if (start < end && state && state->start == start && - !need_resched()) - goto hit_next; - } - goto search_again; - } - /* - * | ---- desired range ---- | - * | state | or | state | - * - * There's a hole, we need to insert something in it and ignore the - * extent we found. - */ - if (state->start > start) { - u64 this_end; - if (end < last_start) - this_end = end; - else - this_end = last_start - 1; - - prealloc = alloc_extent_state_atomic(prealloc); - if (!prealloc) { - err = -ENOMEM; - goto out; - } - - /* - * Avoid to free 'prealloc' if it can be merged with the later - * extent. - */ - prealloc->start = start; - prealloc->end = this_end; - err = insert_state(tree, prealloc, bits, NULL); - if (err) - extent_io_tree_panic(tree, err); - cache_state(prealloc, cached_state); - prealloc = NULL; - start = this_end + 1; - goto search_again; - } - /* - * | ---- desired range ---- | - * | state | - * - * We need to split the extent, and set the bit on the first half. - */ - if (state->start <= end && state->end > end) { - prealloc = alloc_extent_state_atomic(prealloc); - if (!prealloc) { - err = -ENOMEM; - goto out; - } - - err = split_state(tree, state, prealloc, end + 1); - if (err) - extent_io_tree_panic(tree, err); - - set_state_bits(tree, prealloc, bits, NULL); - cache_state(prealloc, cached_state); - clear_state_bit(tree, prealloc, clear_bits, 0, NULL); - prealloc = NULL; - goto out; - } - -search_again: - if (start > end) - goto out; - spin_unlock(&tree->lock); - cond_resched(); - first_iteration = false; - goto again; - -out: - spin_unlock(&tree->lock); - if (prealloc) - free_extent_state(prealloc); - - return err; -} - /* * Find the first range that has @bits not set. This range could start before * @start. diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h index 0dae0a16a87cb7..1567ce11c10c9a 100644 --- a/fs/btrfs/extent-io-tree.h +++ b/fs/btrfs/extent-io-tree.h @@ -185,10 +185,6 @@ static inline int clear_extent_dirty(struct extent_io_tree *tree, u64 start, EXTENT_DO_ACCOUNTING, cached); } -int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - u32 bits, u32 clear_bits, - struct extent_state **cached_state); - static inline int set_extent_delalloc(struct extent_io_tree *tree, u64 start, u64 end, u32 extra_bits, struct extent_state **cached_state) diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index aae33b9067b918..4f4a2006b4d5e3 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -2129,47 +2129,6 @@ TRACE_EVENT(btrfs_clear_extent_bit, __print_flags(__entry->clear_bits, "|", EXTENT_FLAGS)) ); -TRACE_EVENT(btrfs_convert_extent_bit, - TP_PROTO(const struct extent_io_tree *tree, - u64 start, u64 len, unsigned set_bits, unsigned clear_bits), - - TP_ARGS(tree, start, len, set_bits, clear_bits), - - TP_STRUCT__entry_btrfs( - __field( unsigned, owner ) - __field( u64, ino ) - __field( u64, rootid ) - __field( u64, start ) - __field( u64, len ) - __field( unsigned, set_bits) - __field( unsigned, clear_bits) - ), - - TP_fast_assign_btrfs(tree->fs_info, - __entry->owner = tree->owner; - if (tree->inode) { - const struct btrfs_inode *inode = tree->inode; - - __entry->ino = btrfs_ino(inode); - __entry->rootid = inode->root->root_key.objectid; - } else { - __entry->ino = 0; - __entry->rootid = 0; - } - __entry->start = start; - __entry->len = len; - __entry->set_bits = set_bits; - __entry->clear_bits = clear_bits; - ), - - TP_printk_btrfs( -"io_tree=%s ino=%llu root=%llu start=%llu len=%llu set_bits=%s clear_bits=%s", - __print_symbolic(__entry->owner, IO_TREE_OWNER), __entry->ino, - __entry->rootid, __entry->start, __entry->len, - __print_flags(__entry->set_bits , "|", EXTENT_FLAGS), - __print_flags(__entry->clear_bits, "|", EXTENT_FLAGS)) -); - DECLARE_EVENT_CLASS(btrfs_dump_space_info, TP_PROTO(struct btrfs_fs_info *fs_info, const struct btrfs_space_info *sinfo), From patchwork Mon May 15 19:22:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13242118 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58484C7EE24 for ; Mon, 15 May 2023 19:23:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245079AbjEOTXb (ORCPT ); Mon, 15 May 2023 15:23:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245012AbjEOTX2 (ORCPT ); Mon, 15 May 2023 15:23:28 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D4321437E for ; Mon, 15 May 2023 12:23:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=NnEwKH9aBFlqdZe7W0mRyWUjX0YkIFs8Wkrn/e36yYE=; b=Yyg3O/iXO8MGEVkIdepbS0PkbG sW65t44dLCHWTAIJZCPNnol7FV57nMMERFLKbJYviBcSzptt5GWZNMRKgluovRoQ9yW+j0qStd2NL TdCQnehw8e64eMTzEbP8KeeGzLvb7wj607anEQ/dQ6kR1kkHqz3EmnB8oN6l5CY0/Z/nnDseneIPE WlzWjSLj40BlTbyPKShUBooICq4kc8odwiChhf3vcojGy5cwXHk3AD9ZTbOygdQMz5u/yfT9DmUxa 8d3q1r3jtJ1CtT36izjoLvFDPVC8o3oHGiIBkxle17KMLOQ0mpY3GpoXHDTUIdJSWjTNZy918G6dE Gug+Eehw==; Received: from [2001:4bb8:188:2249:ad42:acf3:6731:a041] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pydmv-003HGq-2I; Mon, 15 May 2023 19:23:18 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: linux-btrfs@vger.kernel.org Subject: [PATCH 3/6] btrfs: directly wait for buffer writeback completion in btrfs_wait_buffers Date: Mon, 15 May 2023 21:22:53 +0200 Message-Id: <20230515192256.29006-4-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230515192256.29006-1-hch@lst.de> References: <20230515192256.29006-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Don't bother to got to filemap_fdatawait_range to look up each page in an extent_buffer and then wait for writeback completion on each page, but instead just call wait_on_extent_buffer_writeback on the buffer. All write errors are propagated through the EXTENT_BUFFER_WRITE_ERR bit on the extent_buffer itself, so there is no need to check for per-page errors either. Signed-off-by: Christoph Hellwig --- fs/btrfs/extent_buffer.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent_buffer.c b/fs/btrfs/extent_buffer.c index d20fc0d113968f..70a6a5f2e0e0a8 100644 --- a/fs/btrfs/extent_buffer.c +++ b/fs/btrfs/extent_buffer.c @@ -63,19 +63,15 @@ int btrfs_write_buffers(struct btrfs_fs_info *fs_info, struct list_head *list) int btrfs_wait_buffers(struct btrfs_fs_info *fs_info, struct list_head *list) { - struct address_space *mapping = fs_info->btree_inode->i_mapping; struct dirty_buffer *db; - int werr = 0, err; + int werr = 0; while ((db = list_first_entry_or_null(list, struct dirty_buffer, wb_entry))) { list_del_init(&db->wb_entry); - err = filemap_fdatawait_range(mapping, db->eb->start, - db->eb->start + db->eb->len - 1); - if (err) - werr = err; - if (!werr && test_bit(EXTENT_BUFFER_WRITE_ERR, &db->eb->bflags)) + wait_on_extent_buffer_writeback(db->eb); + if (test_bit(EXTENT_BUFFER_WRITE_ERR, &db->eb->bflags)) werr = -EIO; free_extent_buffer(db->eb); kfree(db); From patchwork Mon May 15 19:22:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13242119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 250FCC77B7D for ; Mon, 15 May 2023 19:23:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245134AbjEOTXm (ORCPT ); Mon, 15 May 2023 15:23:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39136 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245055AbjEOTXa (ORCPT ); Mon, 15 May 2023 15:23:30 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E7D71527B for ; Mon, 15 May 2023 12:23:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=L5msNWR3B/Y9LTDtjGtJokRb2du00+ig4TUArap7Tfo=; b=UwnSy4Z3KeivVbtJPWxtBVs9SR H8MDt0DAGtNl9EDzYMtEkOgu9h4yeLp2QlzOcfN1spbGnhn2iGufIzwTRI20OEH9VgZr3IhddJKa5 OjRbv7BDLZxT6IK6jGpeoCyHu1dNkfit5YOZJX486txIaDSSoE9Yse1mJ5PmI5UbCHZGcziCC675x cgXocCPJQ31pVM7z8pGY7MCATXKKZWG5tkfHkOWBWJ2nSPBO8PI41m8LNymZHuoy3Rvzj5WrR1nF2 cvBM4h0GV3lB0nV/52i73YZAOvRwNS7hMT6CAsSZgTeLIKnygsPNQc3nwMjcN/yZ2XU/RHw9F2yJO aLyLNtOg==; Received: from [2001:4bb8:188:2249:ad42:acf3:6731:a041] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pydmz-003HHL-21; Mon, 15 May 2023 19:23:22 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: linux-btrfs@vger.kernel.org Subject: [PATCH 4/6] btrfs: move dropping the bg reference out of submit_eb_page Date: Mon, 15 May 2023 21:22:54 +0200 Message-Id: <20230515192256.29006-5-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230515192256.29006-1-hch@lst.de> References: <20230515192256.29006-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Instead of putting the cached bg for zoned metadata writes in submit_eb_page, let the btrfs_revert_meta_write_pointer and btrfs_schedule_zone_finish_bg helpers consume it. This mirrors how the reference to it is acquired in btrfs_check_meta_write_pointer and isolated the extent_buffer writeback code from some of the zoned device implementation details. It also avoids a reference roundtrip for the case where btrfs_schedule_zone_finish_bg actually schedules a zone_finish command. Signed-off-by: Christoph Hellwig --- fs/btrfs/extent_io.c | 3 --- fs/btrfs/zoned.c | 7 +++++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index d5937ed0962d38..1205b3a3147e7d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1955,8 +1955,6 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc, if (!lock_extent_buffer_for_io(eb, wbc)) { btrfs_revert_meta_write_pointer(cache, eb); - if (cache) - btrfs_put_block_group(cache); free_extent_buffer(eb); return 0; } @@ -1965,7 +1963,6 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc, * Implies write in zoned mode. Mark the last eb in a block group. */ btrfs_schedule_zone_finish_bg(cache, eb); - btrfs_put_block_group(cache); } write_one_eb(eb, wbc); free_extent_buffer(eb); diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index e4b8134ab70166..eed96ec35052a0 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1752,6 +1752,7 @@ void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, ASSERT(cache->meta_write_pointer == eb->start + eb->len); cache->meta_write_pointer = eb->start; + btrfs_put_block_group(cache); } int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, u64 length) @@ -2145,17 +2146,19 @@ void btrfs_schedule_zone_finish_bg(struct btrfs_block_group *bg, struct extent_buffer *eb) { if (!test_bit(BLOCK_GROUP_FLAG_SEQUENTIAL_ZONE, &bg->runtime_flags) || - eb->start + eb->len * 2 <= bg->start + bg->zone_capacity) + eb->start + eb->len * 2 <= bg->start + bg->zone_capacity) { + btrfs_put_block_group(bg); return; + } if (WARN_ON(bg->zone_finish_work.func == btrfs_zone_finish_endio_workfn)) { btrfs_err(bg->fs_info, "double scheduling of bg %llu zone finishing", bg->start); + btrfs_put_block_group(bg); return; } /* For the work */ - btrfs_get_block_group(bg); atomic_inc(&eb->refs); bg->last_eb = eb; INIT_WORK(&bg->zone_finish_work, btrfs_zone_finish_endio_workfn); From patchwork Mon May 15 19:22:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13242120 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD40DC7EE24 for ; Mon, 15 May 2023 19:23:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245075AbjEOTXr (ORCPT ); Mon, 15 May 2023 15:23:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39112 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245078AbjEOTXb (ORCPT ); Mon, 15 May 2023 15:23:31 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C895714E59 for ; Mon, 15 May 2023 12:23:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=JAQ7Cj65o/H2xA1SVNn/k6h3P7JM83GskdANJ+0AMWo=; b=mlizXNkSPTmXX3vfIFrBMVXDdO 8/D7SB4cb41hIJ0ChamzoHXqVsgl2ez0xzZ2B4chlbY4MQXg4S1u4I6Fo1xjdaaMetX4i1ceHOkgY yoEQ5GMuMWlqunEH/uCBegcJ8E2xvWroUW1QjyEDjHBDO2W66tkm+AhsHLraxmP2TfH2jVbpKwrPL PbHlHfOxiiApskP3Tpj2WJT2r2xVaKJaZ2a/j3EFfJAD/TpJazgHQWCavARsXmYMXOrSpb+RjZYSZ dkTy8rmTb/GZkw16g3xy7XP9hqLqlWoNWQbYd0oouKxVElrfFTr3bjLcY4ZLcmcj+GogMqtv0Fabx WZX2x+Cg==; Received: from [2001:4bb8:188:2249:ad42:acf3:6731:a041] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pydn3-003HIL-2H; Mon, 15 May 2023 19:23:26 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: linux-btrfs@vger.kernel.org Subject: [PATCH 5/6] btrfs: move locking and write pointer checking into write_one_eb Date: Mon, 15 May 2023 21:22:55 +0200 Message-Id: <20230515192256.29006-6-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230515192256.29006-1-hch@lst.de> References: <20230515192256.29006-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently only submit_eb_page contains handling for checking against the write pointer on zoned devices. This is ok as there is no support for block size < PAGE_SIZE with zoned devices at the moment, but prevents us from easily adding new callers of write_one_eb without breaking zoned device support. Move the call tolock_extent_buffer_for_io as well as the write pointer checking and related code into write_one_eb. To be able to do this, write_one_eb now needs to set the eb_context pointer if the caller passed in a non-NULL argument. Signed-off-by: Christoph Hellwig --- fs/btrfs/extent_io.c | 69 +++++++++++++++++++++----------------------- 1 file changed, 33 insertions(+), 36 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1205b3a3147e7d..c291fc728db047 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1759,12 +1759,39 @@ static void prepare_eb_write(struct extent_buffer *eb) } } -static noinline_for_stack void write_one_eb(struct extent_buffer *eb, - struct writeback_control *wbc) +static noinline_for_stack int write_one_eb(struct extent_buffer *eb, + struct writeback_control *wbc, + struct extent_buffer **eb_context) { struct btrfs_fs_info *fs_info = eb->fs_info; + struct btrfs_block_group *zoned_bg = NULL; struct btrfs_bio *bbio; + if (!btrfs_check_meta_write_pointer(eb->fs_info, eb, &zoned_bg)) { + /* + * If for_sync, this hole will be filled with + * trasnsaction commit. + */ + if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) + return -EAGAIN; + return 0; + } + + if (eb_context) + *eb_context = eb; + + if (!lock_extent_buffer_for_io(eb, wbc)) { + btrfs_revert_meta_write_pointer(zoned_bg, eb); + return 0; + } + + /* + * A non-NULL zoned_bg implies zoned mode and that we are writing the + * last possible block in the zone. + */ + if (zoned_bg) + btrfs_schedule_zone_finish_bg(zoned_bg, eb); + prepare_eb_write(eb); bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES, @@ -1801,6 +1828,7 @@ static noinline_for_stack void write_one_eb(struct extent_buffer *eb, } } btrfs_submit_bio(bbio, 0); + return 1; } /* @@ -1868,11 +1896,8 @@ static int submit_eb_subpage(struct page *page, struct writeback_control *wbc) */ if (!eb) continue; - - if (lock_extent_buffer_for_io(eb, wbc)) { - write_one_eb(eb, wbc); + if (write_one_eb(eb, wbc, NULL) > 0) submitted++; - } free_extent_buffer(eb); } return submitted; @@ -1902,7 +1927,6 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc, struct extent_buffer **eb_context) { struct address_space *mapping = page->mapping; - struct btrfs_block_group *cache = NULL; struct extent_buffer *eb; int ret; @@ -1937,36 +1961,9 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc, spin_unlock(&mapping->private_lock); if (!ret) return 0; - - if (!btrfs_check_meta_write_pointer(eb->fs_info, eb, &cache)) { - /* - * If for_sync, this hole will be filled with - * trasnsaction commit. - */ - if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) - ret = -EAGAIN; - else - ret = 0; - free_extent_buffer(eb); - return ret; - } - - *eb_context = eb; - - if (!lock_extent_buffer_for_io(eb, wbc)) { - btrfs_revert_meta_write_pointer(cache, eb); - free_extent_buffer(eb); - return 0; - } - if (cache) { - /* - * Implies write in zoned mode. Mark the last eb in a block group. - */ - btrfs_schedule_zone_finish_bg(cache, eb); - } - write_one_eb(eb, wbc); + ret = write_one_eb(eb, wbc, eb_context); free_extent_buffer(eb); - return 1; + return ret; } int btree_write_cache_pages(struct address_space *mapping, From patchwork Mon May 15 19:22:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13242121 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9340C7EE25 for ; Mon, 15 May 2023 19:24:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245196AbjEOTYQ (ORCPT ); Mon, 15 May 2023 15:24:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245041AbjEOTXk (ORCPT ); Mon, 15 May 2023 15:23:40 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E12711DA9 for ; Mon, 15 May 2023 12:23:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=hs1gUnlU18fPsbLjSnSMBCTQbCPL4u4RPh2nLcXhA6g=; b=wDDpx6h0/HKjk7IpvbbF8aSk5V fQ6JDf7KK4TDRcbd4S2jdcb35yOQlGWzOj4qJk8KK7z2+un4u7zBZ+V7XYnF1CfKDNokwc1wW8hf4 M71QzhXhA33cIUB/ZNKxmgb3pZG/WCCxMpdfeQdBQ3hfoaLozZQtfepq32SPsy+TZAUZ4GvtDBFgF vs7yDx8zM3W/1Fuxxxgyyf1DwzRHQsakdo/6Q0tZmH/he2SfiuaqjmLV4626FeiY0wHYPgtkENKzW jzp7tj3spGYQvYVgGCQXQmo1hID+qwSIHwhOq7KgBFx7op/QJKalzj35NdD65166b638wLdwCqKNm kW8jIeRQ==; Received: from [2001:4bb8:188:2249:ad42:acf3:6731:a041] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pydn8-003HIq-2f; Mon, 15 May 2023 19:23:31 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: linux-btrfs@vger.kernel.org Subject: [PATCH 6/6] btrfs: bypass filemap_fdatawrite_range in btrfs_write_buffers Date: Mon, 15 May 2023 21:22:56 +0200 Message-Id: <20230515192256.29006-7-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230515192256.29006-1-hch@lst.de> References: <20230515192256.29006-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org btrfs_write_buffers already has a pointer to each extent_buffer it needs to write. Instead of going through filemap_fdatawrite_range to look up each page in the buffer, just to cycle back to the buffer for writing it, just call write_one_eb directly. Signed-off-by: Christoph Hellwig --- fs/btrfs/extent_buffer.c | 17 +++++++++++++---- fs/btrfs/extent_buffer.h | 3 +++ fs/btrfs/extent_io.c | 6 +++--- 3 files changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent_buffer.c b/fs/btrfs/extent_buffer.c index 70a6a5f2e0e0a8..8d0802d2ab004e 100644 --- a/fs/btrfs/extent_buffer.c +++ b/fs/btrfs/extent_buffer.c @@ -8,6 +8,7 @@ #include "extent_io.h" #include "fs.h" #include "transaction.h" +#include "zoned.h" void btrfs_add_to_dirty_buffers_list(struct extent_buffer *eb, struct btrfs_transaction *trans, @@ -44,20 +45,28 @@ static int extent_buffer_cmp(void *priv, const struct list_head *a, int btrfs_write_buffers(struct btrfs_fs_info *fs_info, struct list_head *list) { - struct address_space *mapping = fs_info->btree_inode->i_mapping; struct dirty_buffer *db; + struct writeback_control wbc = { + .sync_mode = WB_SYNC_ALL, + .nr_to_write = LONG_MAX, + }; int werr = 0, err; list_sort(NULL, list, extent_buffer_cmp); + btrfs_zoned_meta_io_lock(fs_info); list_for_each_entry(db, list, wb_entry) { - err = filemap_fdatawrite_range(mapping, db->eb->start, - db->eb->start + db->eb->len - 1); - if (err) + err = write_one_eb(db->eb, &wbc, NULL); + if (err < 0) { werr = err; + break; + } cond_resched(); } + btrfs_zoned_meta_io_unlock(fs_info); + if (!werr && BTRFS_FS_ERROR(fs_info)) + return -EROFS; return werr; } diff --git a/fs/btrfs/extent_buffer.h b/fs/btrfs/extent_buffer.h index 880b325d7b411a..c521685d7ca85c 100644 --- a/fs/btrfs/extent_buffer.h +++ b/fs/btrfs/extent_buffer.h @@ -7,6 +7,7 @@ struct btrfs_fs_info; struct btrfs_root; struct btrfs_transaction; struct list_head; +struct writeback_control; struct dirty_buffer { struct list_head wb_entry; @@ -19,5 +20,7 @@ void btrfs_add_to_dirty_buffers_list(struct extent_buffer *eb, int btrfs_write_buffers(struct btrfs_fs_info *fs_info, struct list_head *list); int btrfs_wait_buffers(struct btrfs_fs_info *fs_info, struct list_head *list); void btrfs_release_buffers(struct list_head *list); +int write_one_eb(struct extent_buffer *eb, struct writeback_control *wbc, + struct extent_buffer **eb_context); #endif /* BTRFS_EXTENT_BUFFER_H */ diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c291fc728db047..28078dd4acd99a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -15,6 +15,7 @@ #include #include #include "misc.h" +#include "extent_buffer.h" #include "extent_io.h" #include "extent-io-tree.h" #include "extent_map.h" @@ -1759,9 +1760,8 @@ static void prepare_eb_write(struct extent_buffer *eb) } } -static noinline_for_stack int write_one_eb(struct extent_buffer *eb, - struct writeback_control *wbc, - struct extent_buffer **eb_context) +int write_one_eb(struct extent_buffer *eb, struct writeback_control *wbc, + struct extent_buffer **eb_context) { struct btrfs_fs_info *fs_info = eb->fs_info; struct btrfs_block_group *zoned_bg = NULL;