From patchwork Tue Jan 26 02:24:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58ABBC4332B for ; Tue, 26 Jan 2021 05:29:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2A901206F7 for ; Tue, 26 Jan 2021 05:29:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732918AbhAZF2r (ORCPT ); Tue, 26 Jan 2021 00:28:47 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38250 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727020AbhAZCdP (ORCPT ); Mon, 25 Jan 2021 21:33:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628395; x=1643164395; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CrpsAoe8Gzxu8wUVbVAHTRzJ1euPoNodBOhBXGGCcM8=; b=J/7HDQLOBRrxAAZG0jmTSuTWgAOUtpquZgjzhqvXzb6tXKx6j5ZMnHJn T6fjpAVT2shcTc1K6VvucHA3yKZ07WaTqunTAqTjTlzbh2bVXcaF04ioF YYi7OS6SMnW1GCB3wAmFyYhu3UVKPSVig6+iMy9fJysQ2hOgFSStVO74e LAPj1baqJmzDNO312bkXwzRMwgFrXr1EhDZHPaUZ2Dc5mxtbrHZWqwPIc TEpvxVUZOjOvdJ3Irfx2zI4dYVH5Kqyhd6JsjkB7Y5op5geriuymgpSVC SfjG75Rawfv87yv9qkMyYjnv3HuEMjELri9vDEK7Hy+Vj6HH1YYC3ZMWx w==; IronPort-SDR: HRfT05yong1Joz73IW53UncpLr/3nCCRiAWmqND04I+HcHPAWQUpZ1u9DSGayWfcH+YoNpQJEg +kTQICYBhDAeTWSJgyTC3yDpRi5QGJKUAJH3msRfdUOwYZDcFMsUX53Btjv9prMzMMSLZfDNL+ 9R2QM/h+fqvP8yV73aM1fKMBxDgpC9XUXtI25pkJb6orN8Au1ESl6P5LXXZgzf/GjnDshbpfUk BFQ8maHrC/MqOwswSw4VgkIcQL8Fvpf88e60CKJRaBhwnB0yke/3Altw+cFqsz/HzlWcFk9sEf i7E= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483537" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:25 +0800 IronPort-SDR: d8rYqu6uMPK9QmiYkJyyO2EeERL4c89w+/EWkG1SZtCiiyl6ncYV8JSZ/Zdddw3xYro8oAhHjJ tkDTOsfypelUS8ylO6bXHegUdNItbp3/uSicCtOt2llSQEuKaaUhRLWXGM+lsxShQQUazu060V JQyl4+hiPWewZj+iPPamjFnx7xJ2mKiP6h5t0TsGRSbWhRSwzGWNJNYg8wuIGfoH5cpRnQnEdN ZORS3lU4SpVEt9vyoTXKBgPADlvwl9twJoxqDZIfAdRFp99QRpnhtmxVk5KS/e3CqQ/bmDRVD1 gvZyK5/Bs7kBR3YOqhrAG9+Y Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:51 -0800 IronPort-SDR: RhCZIArcRQKN2DdmIeOj8T5zSMuOq+A14OjHBBOV8Y8hNg7ZRV0U45A3+VVY69zkYR7ndQ+HsZ sfQkqc7jsM0fHckqHKvZQ/rLIX3IgaHbf4NzvVGwHGbXeexSjvowb9SzJRMIchWQwdjn76CQ05 fc+rqYPNC0C8d4ntDU8L5VG7WPGoKFAsRQ7nI2dpSPuuqRkOMr+Qcn1lABSFlOuZc+h8saTvHX MJt24NVJw1MweK489WreoDSk3TROXS5cs3OmPlaOB9OcnuGdJG4xmMYpPM5CudEntZ7QS8nPtD Sek= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:23 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 15/42] btrfs: redirty released extent buffers in ZONED mode Date: Tue, 26 Jan 2021 11:24:53 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On ZONED volumes, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. This patch introduces a list of clean and unwritten extent buffers that have been released in a transaction. Btrfs redirty the buffer so that btree_write_cache_pages() can send proper bios to the devices. Besides it clears the entire content of the extent buffer not to confuse raw block scanners e.g. btrfsck. By clearing the content, csum_dirty_buffer() complains about bytenr mismatch, so avoid the checking and checksum using newly introduced buffer flag EXTENT_BUFFER_NO_CHECK. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 8 ++++++++ fs/btrfs/extent-tree.c | 12 +++++++++++- fs/btrfs/extent_io.c | 4 ++++ fs/btrfs/extent_io.h | 2 ++ fs/btrfs/transaction.c | 10 ++++++++++ fs/btrfs/transaction.h | 3 +++ fs/btrfs/tree-log.c | 6 ++++++ fs/btrfs/zoned.c | 37 +++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 +++++++ 9 files changed, 88 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 76ab86dacc8d..d530bceb8f9b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -459,6 +459,12 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec return 0; found_start = btrfs_header_bytenr(eb); + + if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) { + WARN_ON(found_start != 0); + return 0; + } + /* * Please do not consolidate these warnings into a single if. * It is useful to know what went wrong. @@ -4697,6 +4703,8 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans, EXTENT_DIRTY); btrfs_destroy_pinned_extent(fs_info, &cur_trans->pinned_extents); + btrfs_free_redirty_list(cur_trans); + cur_trans->state =TRANS_STATE_COMPLETED; wake_up(&cur_trans->commit_wait); } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6d6feac90005..36a105697781 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3317,8 +3317,10 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) { ret = check_ref_cleanup(trans, buf->start); - if (!ret) + if (!ret) { + btrfs_redirty_list_add(trans->transaction, buf); goto out; + } } cache = btrfs_lookup_block_group(fs_info, buf->start); @@ -3329,6 +3331,13 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, goto out; } + if (btrfs_is_zoned(fs_info)) { + btrfs_redirty_list_add(trans->transaction, buf); + pin_down_extent(trans, cache, buf->start, buf->len, 1); + btrfs_put_block_group(cache); + goto out; + } + WARN_ON(test_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)); btrfs_add_free_space(cache, buf->start, buf->len); @@ -4663,6 +4672,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root, __btrfs_tree_lock(buf, nest); btrfs_clean_tree_block(buf); clear_bit(EXTENT_BUFFER_STALE, &buf->bflags); + clear_bit(EXTENT_BUFFER_NO_CHECK, &buf->bflags); set_extent_buffer_uptodate(buf); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 2fa563da65bd..d80b3a96ae49 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -24,6 +24,7 @@ #include "rcu-string.h" #include "backref.h" #include "disk-io.h" +#include "zoned.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -5043,6 +5044,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, btrfs_leak_debug_add(&fs_info->eb_leak_lock, &eb->leak_list, &fs_info->allocated_ebs); + INIT_LIST_HEAD(&eb->release_list); spin_lock_init(&eb->refs_lock); atomic_set(&eb->refs, 1); @@ -5830,6 +5832,8 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv, char *src = (char *)srcv; unsigned long i = get_eb_page_index(start); + WARN_ON(test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)); + if (check_eb_range(eb, start, len)) return; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 19221095c635..5a81268c4d8c 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -31,6 +31,7 @@ enum { EXTENT_BUFFER_IN_TREE, /* write IO error */ EXTENT_BUFFER_WRITE_ERR, + EXTENT_BUFFER_NO_CHECK, }; /* these are flags for __process_pages_contig */ @@ -93,6 +94,7 @@ struct extent_buffer { struct rw_semaphore lock; struct page *pages[INLINE_EXTENT_BUFFER_PAGES]; + struct list_head release_list; #ifdef CONFIG_BTRFS_DEBUG struct list_head leak_list; #endif diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 3bcb5444536e..ef4fcb925cb7 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -21,6 +21,7 @@ #include "qgroup.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" #define BTRFS_ROOT_TRANS_TAG 0 @@ -375,6 +376,8 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, spin_lock_init(&cur_trans->dirty_bgs_lock); INIT_LIST_HEAD(&cur_trans->deleted_bgs); spin_lock_init(&cur_trans->dropped_roots_lock); + INIT_LIST_HEAD(&cur_trans->releasing_ebs); + spin_lock_init(&cur_trans->releasing_ebs_lock); list_add_tail(&cur_trans->list, &fs_info->trans_list); extent_io_tree_init(fs_info, &cur_trans->dirty_pages, IO_TREE_TRANS_DIRTY_PAGES, fs_info->btree_inode); @@ -2336,6 +2339,13 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) goto scrub_continue; } + /* + * At this point, we should have written all the tree blocks + * allocated in this transaction. So it's now safe to free the + * redirtyied extent buffers. + */ + btrfs_free_redirty_list(cur_trans); + ret = write_all_supers(fs_info, 0); /* * the super is written, we can safely allow the tree-loggers diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 31ca81bad822..660b4e1f1181 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -92,6 +92,9 @@ struct btrfs_transaction { */ atomic_t pending_ordered; wait_queue_head_t pending_wait; + + spinlock_t releasing_ebs_lock; + struct list_head releasing_ebs; }; #define __TRANS_FREEZABLE (1U << 0) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 8ee0700a980f..930e752686b4 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -19,6 +19,7 @@ #include "qgroup.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" /* magic values for the inode_only field in btrfs_log_inode: * @@ -2752,6 +2753,8 @@ static noinline int walk_down_log_tree(struct btrfs_trans_handle *trans, free_extent_buffer(next); return ret; } + btrfs_redirty_list_add( + trans->transaction, next); } else { if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &next->bflags)) clear_extent_buffer_dirty(next); @@ -3296,6 +3299,9 @@ static void free_log_tree(struct btrfs_trans_handle *trans, clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1, EXTENT_DIRTY | EXTENT_NEW | EXTENT_NEED_WAIT); extent_io_tree_release(&log->log_csum_range); + + if (trans && log->node) + btrfs_redirty_list_add(trans->transaction, log->node); btrfs_put_root(log); } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index f6b52de4abca..db6cb0070220 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -10,6 +10,7 @@ #include "rcu-string.h" #include "disk-io.h" #include "block-group.h" +#include "transaction.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1186,3 +1187,39 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) */ btrfs_free_excluded_extents(cache); } + +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb) +{ + struct btrfs_fs_info *fs_info = eb->fs_info; + + if (!btrfs_is_zoned(fs_info) || + btrfs_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN) || + !list_empty(&eb->release_list)) + return; + + set_extent_buffer_dirty(eb); + set_extent_bits_nowait(&trans->dirty_pages, eb->start, + eb->start + eb->len - 1, EXTENT_DIRTY); + memzero_extent_buffer(eb, 0, eb->len); + set_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags); + + spin_lock(&trans->releasing_ebs_lock); + list_add_tail(&eb->release_list, &trans->releasing_ebs); + spin_unlock(&trans->releasing_ebs_lock); + atomic_inc(&eb->refs); +} + +void btrfs_free_redirty_list(struct btrfs_transaction *trans) +{ + spin_lock(&trans->releasing_ebs_lock); + while (!list_empty(&trans->releasing_ebs)) { + struct extent_buffer *eb; + + eb = list_first_entry(&trans->releasing_ebs, + struct extent_buffer, release_list); + list_del_init(&eb->release_list); + free_extent_buffer(eb); + } + spin_unlock(&trans->releasing_ebs_lock); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 0cc0b27e9437..b2ce16de0c22 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -43,6 +43,9 @@ int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb); +void btrfs_free_redirty_list(struct btrfs_transaction *trans); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -127,6 +130,10 @@ static inline int btrfs_load_block_group_zone_info( static inline void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) { } +static inline void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb) { } +static inline void btrfs_free_redirty_list(struct btrfs_transaction *trans) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)