From patchwork Fri Jun 21 12:20:57 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhiyong Wu X-Patchwork-Id: 2761911 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 331BFC0AB1 for ; Fri, 21 Jun 2013 12:32:34 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 88A9E2020C for ; Fri, 21 Jun 2013 12:32:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A06862020A for ; Fri, 21 Jun 2013 12:32:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161297Ab3FUMcZ (ORCPT ); Fri, 21 Jun 2013 08:32:25 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:59356 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161083Ab3FUMcX (ORCPT ); Fri, 21 Jun 2013 08:32:23 -0400 Received: from /spool/local by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 21 Jun 2013 06:22:01 -0600 Received: from d03dlp02.boulder.ibm.com (9.17.202.178) by e39.co.us.ibm.com (192.168.1.139) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 21 Jun 2013 06:21:59 -0600 Received: from d03relay01.boulder.ibm.com (d03relay01.boulder.ibm.com [9.17.195.226]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id E24953E40030 for ; Fri, 21 Jun 2013 06:21:39 -0600 (MDT) Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay01.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r5LCLwo0134178 for ; Fri, 21 Jun 2013 06:21:58 -0600 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r5LCLtWP006492 for ; Fri, 21 Jun 2013 06:21:58 -0600 Received: from us.ibm.com (f17.cn.ibm.com [9.115.122.140]) by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with SMTP id r5LCLpWU006122; Fri, 21 Jun 2013 06:21:52 -0600 Received: by us.ibm.com (sSMTP sendmail emulation); Fri, 21 Jun 2013 20:21:11 +0800 From: zwu.kernel@gmail.com To: linux-btrfs@vger.kernel.org Cc: viro@zeniv.linux.org.uk, sekharan@us.ibm.com, linuxram@us.ibm.com, david@fromorbit.com, chris.mason@fusionio.com, jbacik@fusionio.com, idryomov@gmail.com, Martin@lichtvoll.de, Zhi Yong Wu Subject: [RFC PATCH v2 2/5] BTRFS hot reloc: add one new block group Date: Fri, 21 Jun 2013 20:20:57 +0800 Message-Id: <1371817260-8615-3-git-send-email-zwu.kernel@gmail.com> X-Mailer: git-send-email 1.7.11.7 In-Reply-To: <1371817260-8615-1-git-send-email-zwu.kernel@gmail.com> References: <1371817260-8615-1-git-send-email-zwu.kernel@gmail.com> X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13062112-3620-0000-0000-0000033AE9C3 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, FREEMAIL_FROM,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Zhi Yong Wu Introduce one new block group BTRFS_BLOCK_GROUP_DATA_NONROT, which is used to differentiate if the block space is reserved and allocated from one rotating disk or nonrotating disk. Signed-off-by: Zhi Yong Wu --- fs/btrfs/ctree.h | 33 +++++++++++--- fs/btrfs/extent-tree.c | 99 +++++++++++++++++++++++++++++++++-------- fs/btrfs/extent_io.c | 51 ++++++++++++++++++++- fs/btrfs/extent_io.h | 7 +++ fs/btrfs/file.c | 27 +++++++---- fs/btrfs/free-space-cache.c | 2 +- fs/btrfs/inode-map.c | 7 +-- fs/btrfs/inode.c | 106 +++++++++++++++++++++++++++++++++++--------- fs/btrfs/ioctl.c | 17 ++++--- fs/btrfs/relocation.c | 6 ++- fs/btrfs/super.c | 4 +- fs/btrfs/volumes.c | 29 +++++++++++- 12 files changed, 318 insertions(+), 70 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 745cac4..1c11be1 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -963,6 +963,12 @@ struct btrfs_dev_replace_item { #define BTRFS_BLOCK_GROUP_RAID10 (1ULL << 6) #define BTRFS_BLOCK_GROUP_RAID5 (1 << 7) #define BTRFS_BLOCK_GROUP_RAID6 (1 << 8) +/* + * New block groups for use with BTRFS hot relocation feature. + * When BTRFS hot relocation is enabled, *_NONROT block group is + * forced to nonrotating drives. + */ +#define BTRFS_BLOCK_GROUP_DATA_NONROT (1ULL << 9) #define BTRFS_BLOCK_GROUP_RESERVED BTRFS_AVAIL_ALLOC_BIT_SINGLE enum btrfs_raid_types { @@ -978,7 +984,8 @@ enum btrfs_raid_types { #define BTRFS_BLOCK_GROUP_TYPE_MASK (BTRFS_BLOCK_GROUP_DATA | \ BTRFS_BLOCK_GROUP_SYSTEM | \ - BTRFS_BLOCK_GROUP_METADATA) + BTRFS_BLOCK_GROUP_METADATA | \ + BTRFS_BLOCK_GROUP_DATA_NONROT) #define BTRFS_BLOCK_GROUP_PROFILE_MASK (BTRFS_BLOCK_GROUP_RAID0 | \ BTRFS_BLOCK_GROUP_RAID1 | \ @@ -1521,6 +1528,7 @@ struct btrfs_fs_info { struct list_head space_info; struct btrfs_space_info *data_sinfo; + struct btrfs_space_info *nonrot_data_sinfo; struct reloc_control *reloc_ctl; @@ -1545,6 +1553,7 @@ struct btrfs_fs_info { u64 avail_data_alloc_bits; u64 avail_metadata_alloc_bits; u64 avail_system_alloc_bits; + u64 avail_data_nonrot_alloc_bits; /* restriper state */ spinlock_t balance_lock; @@ -1557,6 +1566,7 @@ struct btrfs_fs_info { unsigned data_chunk_allocations; unsigned metadata_ratio; + unsigned data_nonrot_chunk_allocations; void *bdev_holder; @@ -1928,6 +1938,7 @@ struct btrfs_ioctl_defrag_range_args { #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1 << 21) #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR (1 << 22) #define BTRFS_MOUNT_HOT_TRACK (1 << 23) +#define BTRFS_MOUNT_HOT_MOVE (1 << 24) #define btrfs_clear_opt(o, opt) ((o) &= ~BTRFS_MOUNT_##opt) #define btrfs_set_opt(o, opt) ((o) |= BTRFS_MOUNT_##opt) @@ -3043,6 +3054,8 @@ int btrfs_pin_extent_for_log_replay(struct btrfs_root *root, int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 objectid, u64 offset, u64 bytenr); +struct btrfs_block_group_cache *btrfs_lookup_first_block_group( + struct btrfs_fs_info *info, u64 bytenr); struct btrfs_block_group_cache *btrfs_lookup_block_group( struct btrfs_fs_info *info, u64 bytenr); @@ -3093,6 +3106,8 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid, u64 owner, u64 offset, int for_cow); +struct btrfs_block_group_cache *next_block_group(struct btrfs_root *root, + struct btrfs_block_group_cache *cache); int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans, struct btrfs_root *root); @@ -3122,8 +3137,14 @@ enum btrfs_reserve_flush_enum { BTRFS_RESERVE_FLUSH_ALL, }; -int btrfs_check_data_free_space(struct inode *inode, u64 bytes); -void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes); +enum { + TYPE_ROT = 0, /* rot -> rotating */ + TYPE_NONROT, /* nonrot -> nonrotating */ + MAX_RELOC_TYPES, +}; + +int btrfs_check_data_free_space(struct inode *inode, u64 bytes, int *flag); +void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes, int flag); void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans, struct btrfs_root *root); int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans, @@ -3138,8 +3159,8 @@ void btrfs_subvolume_release_metadata(struct btrfs_root *root, u64 qgroup_reserved); int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes); void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes); -int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes); -void btrfs_delalloc_release_space(struct inode *inode, u64 num_bytes); +int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes, int *flag); +void btrfs_delalloc_release_space(struct inode *inode, u64 num_bytes, int flag); void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type); struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root, unsigned short type); @@ -3612,7 +3633,7 @@ int btrfs_release_file(struct inode *inode, struct file *file); int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode, struct page **pages, size_t num_pages, loff_t pos, size_t write_bytes, - struct extent_state **cached); + struct extent_state **cached, int flag); /* tree-defrag.c */ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index df472ab..a7b3044 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -628,7 +628,7 @@ static int cache_block_group(struct btrfs_block_group_cache *cache, /* * return the block group that starts at or after bytenr */ -static struct btrfs_block_group_cache * +struct btrfs_block_group_cache * btrfs_lookup_first_block_group(struct btrfs_fs_info *info, u64 bytenr) { struct btrfs_block_group_cache *cache; @@ -3027,7 +3027,7 @@ fail: } -static struct btrfs_block_group_cache * +struct btrfs_block_group_cache * next_block_group(struct btrfs_root *root, struct btrfs_block_group_cache *cache) { @@ -3056,6 +3056,7 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group, int num_pages = 0; int retries = 0; int ret = 0; + int flag = TYPE_ROT; /* * If this block group is smaller than 100 megs don't bother caching the @@ -3144,7 +3145,7 @@ again: num_pages *= 16; num_pages *= PAGE_CACHE_SIZE; - ret = btrfs_check_data_free_space(inode, num_pages); + ret = btrfs_check_data_free_space(inode, num_pages, &flag); if (ret) goto out_put; @@ -3153,7 +3154,8 @@ again: &alloc_hint); if (!ret) dcs = BTRFS_DC_SETUP; - btrfs_free_reserved_data_space(inode, num_pages); + + btrfs_free_reserved_data_space(inode, num_pages, flag); out_put: iput(inode); @@ -3355,6 +3357,8 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags, list_add_rcu(&found->list, &info->space_info); if (flags & BTRFS_BLOCK_GROUP_DATA) info->data_sinfo = found; + else if (flags & BTRFS_BLOCK_GROUP_DATA_NONROT) + info->nonrot_data_sinfo = found; return 0; } @@ -3370,6 +3374,8 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) fs_info->avail_metadata_alloc_bits |= extra_flags; if (flags & BTRFS_BLOCK_GROUP_SYSTEM) fs_info->avail_system_alloc_bits |= extra_flags; + if (flags & BTRFS_BLOCK_GROUP_DATA_NONROT) + fs_info->avail_data_nonrot_alloc_bits |= extra_flags; write_sequnlock(&fs_info->profiles_lock); } @@ -3476,18 +3482,27 @@ static u64 get_alloc_profile(struct btrfs_root *root, u64 flags) flags |= root->fs_info->avail_system_alloc_bits; else if (flags & BTRFS_BLOCK_GROUP_METADATA) flags |= root->fs_info->avail_metadata_alloc_bits; + else if (flags & BTRFS_BLOCK_GROUP_DATA_NONROT) + flags |= root->fs_info->avail_data_nonrot_alloc_bits; } while (read_seqretry(&root->fs_info->profiles_lock, seq)); return btrfs_reduce_alloc_profile(root, flags); } +/* + * Turns a chunk_type integer into set of block group flags (a profile). + * Hot relocation code adds chunk_type 2 for hot data specific block + * group type. + */ u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data) { u64 flags; u64 ret; - if (data) + if (data == 1) flags = BTRFS_BLOCK_GROUP_DATA; + else if (data == 2) + flags = BTRFS_BLOCK_GROUP_DATA_NONROT; else if (root == root->fs_info->chunk_root) flags = BTRFS_BLOCK_GROUP_SYSTEM; else @@ -3501,13 +3516,14 @@ u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data) * This will check the space that the inode allocates from to make sure we have * enough space for bytes. */ -int btrfs_check_data_free_space(struct inode *inode, u64 bytes) +int btrfs_check_data_free_space(struct inode *inode, u64 bytes, int *flag) { struct btrfs_space_info *data_sinfo; struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_fs_info *fs_info = root->fs_info; u64 used; int ret = 0, committed = 0, alloc_chunk = 1; + int data, tried = 0; /* make sure bytes are sectorsize aligned */ bytes = ALIGN(bytes, root->sectorsize); @@ -3518,7 +3534,15 @@ int btrfs_check_data_free_space(struct inode *inode, u64 bytes) committed = 1; } - data_sinfo = fs_info->data_sinfo; + if (*flag == TYPE_NONROT) { +try_nonrot: + data = 2; + data_sinfo = fs_info->nonrot_data_sinfo; + } else { + data = 1; + data_sinfo = fs_info->data_sinfo; + } + if (!data_sinfo) goto alloc; @@ -3536,13 +3560,22 @@ again: * if we don't have enough free bytes in this space then we need * to alloc a new chunk. */ - if (!data_sinfo->full && alloc_chunk) { + if (alloc_chunk) { u64 alloc_target; + if (data_sinfo->full) { + if (!tried) { + tried = 1; + spin_unlock(&data_sinfo->lock); + goto try_nonrot; + } else + goto non_alloc; + } + data_sinfo->force_alloc = CHUNK_ALLOC_FORCE; spin_unlock(&data_sinfo->lock); alloc: - alloc_target = btrfs_get_alloc_profile(root, 1); + alloc_target = btrfs_get_alloc_profile(root, data); trans = btrfs_join_transaction(root); if (IS_ERR(trans)) return PTR_ERR(trans); @@ -3559,11 +3592,13 @@ alloc: } if (!data_sinfo) - data_sinfo = fs_info->data_sinfo; + data_sinfo = (data == 1) ? fs_info->data_sinfo : + fs_info->nonrot_data_sinfo; goto again; } +non_alloc: /* * If we have less pinned bytes than we want to allocate then * don't bother committing the transaction, it won't help us. @@ -3574,7 +3609,7 @@ alloc: /* commit the current transaction and try again */ commit_trans: - if (!committed && + if (!committed && data_sinfo && !atomic_read(&root->fs_info->open_ioctl_trans)) { committed = 1; trans = btrfs_join_transaction(root); @@ -3588,6 +3623,10 @@ commit_trans: return -ENOSPC; } + + if (tried) + *flag = TYPE_NONROT; + data_sinfo->bytes_may_use += bytes; trace_btrfs_space_reservation(root->fs_info, "space_info", data_sinfo->flags, bytes, 1); @@ -3599,7 +3638,7 @@ commit_trans: /* * Called if we need to clear a data reservation for this inode. */ -void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes) +void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes, int flag) { struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_space_info *data_sinfo; @@ -3607,7 +3646,10 @@ void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes) /* make sure bytes are sectorsize aligned */ bytes = ALIGN(bytes, root->sectorsize); - data_sinfo = root->fs_info->data_sinfo; + if (flag == TYPE_NONROT) + data_sinfo = root->fs_info->nonrot_data_sinfo; + else + data_sinfo = root->fs_info->data_sinfo; spin_lock(&data_sinfo->lock); data_sinfo->bytes_may_use -= bytes; trace_btrfs_space_reservation(root->fs_info, "space_info", @@ -3791,6 +3833,13 @@ again: force_metadata_allocation(fs_info); } + if (flags & BTRFS_BLOCK_GROUP_DATA_NONROT && fs_info->metadata_ratio) { + fs_info->data_nonrot_chunk_allocations++; + if (!(fs_info->data_nonrot_chunk_allocations % + fs_info->metadata_ratio)) + force_metadata_allocation(fs_info); + } + /* * Check if we have enough space in SYSTEM chunk because we may need * to update devices. @@ -4497,6 +4546,13 @@ static u64 calc_global_metadata_size(struct btrfs_fs_info *fs_info) meta_used = sinfo->bytes_used; spin_unlock(&sinfo->lock); + sinfo = __find_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA_NONROT); + if (sinfo) { + spin_lock(&sinfo->lock); + data_used += sinfo->bytes_used; + spin_unlock(&sinfo->lock); + } + num_bytes = (data_used >> fs_info->sb->s_blocksize_bits) * csum_size * 2; num_bytes += div64_u64(data_used + meta_used, 50); @@ -4972,6 +5028,7 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes) * btrfs_delalloc_reserve_space - reserve data and metadata space for delalloc * @inode: inode we're writing to * @num_bytes: the number of bytes we want to allocate + * @flag: indicate if block space is reserved from rotating disk or not * * This will do the following things * @@ -4983,17 +5040,17 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes) * * This will return 0 for success and -ENOSPC if there is no space left. */ -int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes) +int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes, int *flag) { int ret; - ret = btrfs_check_data_free_space(inode, num_bytes); + ret = btrfs_check_data_free_space(inode, num_bytes, flag); if (ret) return ret; ret = btrfs_delalloc_reserve_metadata(inode, num_bytes); if (ret) { - btrfs_free_reserved_data_space(inode, num_bytes); + btrfs_free_reserved_data_space(inode, num_bytes, *flag); return ret; } @@ -5004,6 +5061,7 @@ int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes) * btrfs_delalloc_release_space - release data and metadata space for delalloc * @inode: inode we're releasing space for * @num_bytes: the number of bytes we want to free up + * @flag: indicate if block space is freed from rotating disk or not * * This must be matched with a call to btrfs_delalloc_reserve_space. This is * called in the case that we don't need the metadata AND data reservations @@ -5013,10 +5071,10 @@ int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes) * decrement ->delalloc_bytes and remove it from the fs_info delalloc_inodes * list if there are no delalloc bytes left. */ -void btrfs_delalloc_release_space(struct inode *inode, u64 num_bytes) +void btrfs_delalloc_release_space(struct inode *inode, u64 num_bytes, int flag) { btrfs_delalloc_release_metadata(inode, num_bytes); - btrfs_free_reserved_data_space(inode, num_bytes); + btrfs_free_reserved_data_space(inode, num_bytes, flag); } static int update_block_group(struct btrfs_root *root, @@ -5892,7 +5950,8 @@ static noinline int find_free_extent(struct btrfs_trans_handle *trans, struct btrfs_space_info *space_info; int loop = 0; int index = __get_raid_index(flags); - int alloc_type = (flags & BTRFS_BLOCK_GROUP_DATA) ? + int alloc_type = ((flags & BTRFS_BLOCK_GROUP_DATA) + || (flags & BTRFS_BLOCK_GROUP_DATA_NONROT)) ? RESERVE_ALLOC_NO_ACCOUNT : RESERVE_ALLOC; bool found_uncached_bg = false; bool failed_cluster_refill = false; @@ -8366,6 +8425,8 @@ static void clear_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) fs_info->avail_metadata_alloc_bits &= ~extra_flags; if (flags & BTRFS_BLOCK_GROUP_SYSTEM) fs_info->avail_system_alloc_bits &= ~extra_flags; + if (flags & BTRFS_BLOCK_GROUP_DATA_NONROT) + fs_info->avail_data_nonrot_alloc_bits &= ~extra_flags; write_sequnlock(&fs_info->profiles_lock); } diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index e7e7afb..6fbfc90 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1229,6 +1229,26 @@ int clear_extent_uptodate(struct extent_io_tree *tree, u64 start, u64 end, cached_state, mask); } +void set_extent_hot(struct inode *inode, u64 start, u64 end, + struct extent_state **cached_state, + int type, int flag) +{ + int bits = (type == TYPE_NONROT) ? EXTENT_HOT : EXTENT_COLD; + + if (flag) { + bits |= EXTENT_DELALLOC | EXTENT_UPTODATE; + + clear_extent_bit(&BTRFS_I(inode)->io_tree, start, end, + EXTENT_DIRTY | EXTENT_DELALLOC | + EXTENT_DO_ACCOUNTING | + EXTENT_HOT | EXTENT_COLD, + 0, 0, cached_state, GFP_NOFS); + } + + set_extent_bit(&BTRFS_I(inode)->io_tree, start, + end, bits, NULL, cached_state, GFP_NOFS); +} + /* * either insert or lock state struct between start and end use mask to tell * us if waiting is desired. @@ -1430,9 +1450,11 @@ static noinline u64 find_delalloc_range(struct extent_io_tree *tree, { struct rb_node *node; struct extent_state *state; + struct btrfs_root *root; u64 cur_start = *start; u64 found = 0; u64 total_bytes = 0; + int flag = EXTENT_DELALLOC; spin_lock(&tree->lock); @@ -1447,13 +1469,27 @@ static noinline u64 find_delalloc_range(struct extent_io_tree *tree, goto out; } + root = BTRFS_I(tree->mapping->host)->root; while (1) { state = rb_entry(node, struct extent_state, rb_node); if (found && (state->start != cur_start || (state->state & EXTENT_BOUNDARY))) { goto out; } - if (!(state->state & EXTENT_DELALLOC)) { + if (btrfs_test_opt(root, HOT_MOVE)) { + if (!(state->state & EXTENT_DELALLOC) || + (!(state->state & EXTENT_HOT) && + !(state->state & EXTENT_COLD))) { + if (!found) + *end = state->end; + goto out; + } else { + if (!found) + flag = (state->state & EXTENT_HOT) ? + EXTENT_HOT : EXTENT_COLD; + } + } + if (!(state->state & flag)) { if (!found) *end = state->end; goto out; @@ -1640,7 +1676,13 @@ again: lock_extent_bits(tree, delalloc_start, delalloc_end, 0, &cached_state); /* then test to make sure it is all still delalloc */ - ret = test_range_bit(tree, delalloc_start, delalloc_end, + if (btrfs_test_opt(BTRFS_I(inode)->root, HOT_MOVE)) { + ret = test_range_bit(tree, delalloc_start, delalloc_end, + EXTENT_DELALLOC | EXTENT_HOT, 1, cached_state) || + test_range_bit(tree, delalloc_start, delalloc_end, + EXTENT_DELALLOC | EXTENT_COLD, 1, cached_state); + } else + ret = test_range_bit(tree, delalloc_start, delalloc_end, EXTENT_DELALLOC, 1, cached_state); if (!ret) { unlock_extent_cached(tree, delalloc_start, delalloc_end, @@ -1678,6 +1720,11 @@ int extent_clear_unlock_delalloc(struct inode *inode, if (op & EXTENT_CLEAR_DELALLOC) clear_bits |= EXTENT_DELALLOC; + if (op & EXTENT_CLEAR_HOT) + clear_bits |= EXTENT_HOT; + if (op & EXTENT_CLEAR_COLD) + clear_bits |= EXTENT_COLD; + clear_extent_bit(tree, start, end, clear_bits, 1, 0, NULL, GFP_NOFS); if (!(op & (EXTENT_CLEAR_UNLOCK_PAGE | EXTENT_CLEAR_DIRTY | EXTENT_SET_WRITEBACK | EXTENT_END_WRITEBACK | diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 41fb81e..ef63452 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -19,6 +19,8 @@ #define EXTENT_FIRST_DELALLOC (1 << 12) #define EXTENT_NEED_WAIT (1 << 13) #define EXTENT_DAMAGED (1 << 14) +#define EXTENT_HOT (1 << 15) +#define EXTENT_COLD (1 << 16) #define EXTENT_IOBITS (EXTENT_LOCKED | EXTENT_WRITEBACK) #define EXTENT_CTLBITS (EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC) @@ -51,6 +53,8 @@ #define EXTENT_END_WRITEBACK 0x20 #define EXTENT_SET_PRIVATE2 0x40 #define EXTENT_CLEAR_ACCOUNTING 0x80 +#define EXTENT_CLEAR_HOT 0x100 +#define EXTENT_CLEAR_COLD 0x200 /* * page->private values. Every page that is controlled by the extent @@ -237,6 +241,9 @@ int set_extent_delalloc(struct extent_io_tree *tree, u64 start, u64 end, struct extent_state **cached_state, gfp_t mask); int set_extent_defrag(struct extent_io_tree *tree, u64 start, u64 end, struct extent_state **cached_state, gfp_t mask); +void set_extent_hot(struct inode *inode, u64 start, u64 end, + struct extent_state **cached_state, + int type, int flag); int find_first_extent_bit(struct extent_io_tree *tree, u64 start, u64 *start_ret, u64 *end_ret, unsigned long bits, struct extent_state **cached_state); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 4205ba7..e3c58c4 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -41,6 +41,7 @@ #include "locking.h" #include "compat.h" #include "volumes.h" +#include "hot_relocate.h" static struct kmem_cache *btrfs_inode_defrag_cachep; /* @@ -500,7 +501,7 @@ static void btrfs_drop_pages(struct page **pages, size_t num_pages) int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode, struct page **pages, size_t num_pages, loff_t pos, size_t write_bytes, - struct extent_state **cached) + struct extent_state **cached, int flag) { int err = 0; int i; @@ -514,6 +515,11 @@ int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode, num_bytes = ALIGN(write_bytes + pos - start_pos, root->sectorsize); end_of_last_block = start_pos + num_bytes - 1; + + if (btrfs_test_opt(root, HOT_MOVE)) + set_extent_hot(inode, start_pos, end_of_last_block, + cached, flag, 0); + err = btrfs_set_extent_delalloc(inode, start_pos, end_of_last_block, cached); if (err) @@ -1294,7 +1300,8 @@ again: clear_extent_bit(&BTRFS_I(inode)->io_tree, start_pos, last_pos - 1, EXTENT_DIRTY | EXTENT_DELALLOC | - EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, + EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG | + EXTENT_HOT | EXTENT_COLD, 0, 0, &cached_state, GFP_NOFS); unlock_extent_cached(&BTRFS_I(inode)->io_tree, start_pos, last_pos - 1, &cached_state, @@ -1350,6 +1357,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; size_t dirty_pages; size_t copied; + int flag = TYPE_ROT; WARN_ON(num_pages > nrptrs); @@ -1363,7 +1371,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, } ret = btrfs_delalloc_reserve_space(inode, - num_pages << PAGE_CACHE_SHIFT); + num_pages << PAGE_CACHE_SHIFT, &flag); if (ret) break; @@ -1377,7 +1385,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, force_page_uptodate); if (ret) { btrfs_delalloc_release_space(inode, - num_pages << PAGE_CACHE_SHIFT); + num_pages << PAGE_CACHE_SHIFT, flag); break; } @@ -1416,16 +1424,16 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, } btrfs_delalloc_release_space(inode, (num_pages - dirty_pages) << - PAGE_CACHE_SHIFT); + PAGE_CACHE_SHIFT, flag); } if (copied > 0) { ret = btrfs_dirty_pages(root, inode, pages, dirty_pages, pos, copied, - NULL); + NULL, flag); if (ret) { btrfs_delalloc_release_space(inode, - dirty_pages << PAGE_CACHE_SHIFT); + dirty_pages << PAGE_CACHE_SHIFT, flag); btrfs_drop_pages(pages, num_pages); break; } @@ -2150,6 +2158,7 @@ static long btrfs_fallocate(struct file *file, int mode, u64 locked_end; struct extent_map *em; int blocksize = BTRFS_I(inode)->root->sectorsize; + int flag = TYPE_ROT; int ret; alloc_start = round_down(offset, blocksize); @@ -2166,7 +2175,7 @@ static long btrfs_fallocate(struct file *file, int mode, * Make sure we have enough space before we do the * allocation. */ - ret = btrfs_check_data_free_space(inode, alloc_end - alloc_start); + ret = btrfs_check_data_free_space(inode, alloc_end - alloc_start, &flag); if (ret) return ret; if (root->fs_info->quota_enabled) { @@ -2281,7 +2290,7 @@ out: btrfs_qgroup_free(root, alloc_end - alloc_start); out_reserve_fail: /* Let go of our reservation. */ - btrfs_free_reserved_data_space(inode, alloc_end - alloc_start); + btrfs_free_reserved_data_space(inode, alloc_end - alloc_start, flag); return ret; } diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index e530096..18f0467 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1004,7 +1004,7 @@ static int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode, io_ctl_zero_remaining_pages(&io_ctl); ret = btrfs_dirty_pages(root, inode, io_ctl.pages, io_ctl.num_pages, - 0, i_size_read(inode), &cached_state); + 0, i_size_read(inode), &cached_state, TYPE_ROT); io_ctl_drop_pages(&io_ctl); unlock_extent_cached(&BTRFS_I(inode)->io_tree, 0, i_size_read(inode) - 1, &cached_state, GFP_NOFS); diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c index 2c66ddb..0f8268c 100644 --- a/fs/btrfs/inode-map.c +++ b/fs/btrfs/inode-map.c @@ -403,6 +403,7 @@ int btrfs_save_ino_cache(struct btrfs_root *root, u64 alloc_hint = 0; int ret; int prealloc; + int flag = TYPE_ROT; bool retry = false; /* only fs tree and subvol/snap needs ino cache */ @@ -492,17 +493,17 @@ again: /* Just to make sure we have enough space */ prealloc += 8 * PAGE_CACHE_SIZE; - ret = btrfs_delalloc_reserve_space(inode, prealloc); + ret = btrfs_delalloc_reserve_space(inode, prealloc, &flag); if (ret) goto out_put; ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc, prealloc, prealloc, &alloc_hint); if (ret) { - btrfs_delalloc_release_space(inode, prealloc); + btrfs_delalloc_release_space(inode, prealloc, flag); goto out_put; } - btrfs_free_reserved_data_space(inode, prealloc); + btrfs_free_reserved_data_space(inode, prealloc, flag); ret = btrfs_write_out_ino_cache(root, trans, path); out_put: diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 17f3064..437d20f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -57,6 +57,7 @@ #include "free-space-cache.h" #include "inode-map.h" #include "backref.h" +#include "hot_relocate.h" struct btrfs_iget_args { u64 ino; @@ -106,6 +107,27 @@ static struct extent_map *create_pinned_em(struct inode *inode, u64 start, static int btrfs_dirty_inode(struct inode *inode); +static int get_chunk_type(struct inode *inode, u64 start, u64 end) +{ + int hot, cold, ret = 1; + + hot = test_range_bit(&BTRFS_I(inode)->io_tree, + start, end, EXTENT_HOT, 1, NULL); + cold = test_range_bit(&BTRFS_I(inode)->io_tree, + start, end, EXTENT_COLD, 1, NULL); + + WARN_ON(hot && cold); + + if (hot) + ret = 2; + else if (cold) + ret = 1; + else + WARN_ON(1); + + return ret; +} + static int btrfs_init_inode_security(struct btrfs_trans_handle *trans, struct inode *inode, struct inode *dir, const struct qstr *qstr) @@ -861,13 +883,14 @@ static noinline int __cow_file_range(struct btrfs_trans_handle *trans, { u64 alloc_hint = 0; u64 num_bytes; - unsigned long ram_size; + unsigned long ram_size, hot_flag = 0; u64 disk_num_bytes; u64 cur_alloc_size; u64 blocksize = root->sectorsize; struct btrfs_key ins; struct extent_map *em; struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree; + int chunk_type = 1; int ret = 0; BUG_ON(btrfs_is_free_space_inode(inode)); @@ -875,6 +898,7 @@ static noinline int __cow_file_range(struct btrfs_trans_handle *trans, num_bytes = ALIGN(end - start + 1, blocksize); num_bytes = max(blocksize, num_bytes); disk_num_bytes = num_bytes; + ret = 0; /* if this is a small write inside eof, kick off defrag */ if (num_bytes < 64 * 1024 && @@ -894,7 +918,8 @@ static noinline int __cow_file_range(struct btrfs_trans_handle *trans, EXTENT_CLEAR_DELALLOC | EXTENT_CLEAR_DIRTY | EXTENT_SET_WRITEBACK | - EXTENT_END_WRITEBACK); + EXTENT_END_WRITEBACK | + hot_flag); *nr_written = *nr_written + (end - start + PAGE_CACHE_SIZE) / PAGE_CACHE_SIZE; @@ -916,9 +941,25 @@ static noinline int __cow_file_range(struct btrfs_trans_handle *trans, unsigned long op; cur_alloc_size = disk_num_bytes; + + /* + * Use COW operations to move hot data to SSD and cold data + * back to rotating disk. Sets chunk_type to 1 to indicate + * to write to BTRFS_BLOCK_GROUP_DATA or 2 to indicate + * BTRFS_BLOCK_GROUP_DATA_NONROT. + */ + if (btrfs_test_opt(root, HOT_MOVE)) { + chunk_type = get_chunk_type(inode, start, + start + cur_alloc_size - 1); + if (chunk_type == 2) + hot_flag = EXTENT_CLEAR_HOT; + else + hot_flag = EXTENT_CLEAR_COLD; + } + ret = btrfs_reserve_extent(trans, root, cur_alloc_size, root->sectorsize, 0, alloc_hint, - &ins, 1); + &ins, chunk_type); if (ret < 0) { btrfs_abort_transaction(trans, root, ret); goto out_unlock; @@ -986,7 +1027,7 @@ static noinline int __cow_file_range(struct btrfs_trans_handle *trans, */ op = unlock ? EXTENT_CLEAR_UNLOCK_PAGE : 0; op |= EXTENT_CLEAR_UNLOCK | EXTENT_CLEAR_DELALLOC | - EXTENT_SET_PRIVATE2; + EXTENT_SET_PRIVATE2 | hot_flag; extent_clear_unlock_delalloc(inode, &BTRFS_I(inode)->io_tree, start, start + ram_size - 1, @@ -1010,7 +1051,8 @@ out_unlock: EXTENT_CLEAR_DELALLOC | EXTENT_CLEAR_DIRTY | EXTENT_SET_WRITEBACK | - EXTENT_END_WRITEBACK); + EXTENT_END_WRITEBACK | + hot_flag); goto out; } @@ -1604,8 +1646,12 @@ static void btrfs_clear_bit_hook(struct inode *inode, btrfs_delalloc_release_metadata(inode, len); if (root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID - && do_list) - btrfs_free_reserved_data_space(inode, len); + && do_list) { + int flag = TYPE_ROT; + if ((state->state & EXTENT_HOT) && (*bits & EXTENT_HOT)) + flag = TYPE_NONROT; + btrfs_free_reserved_data_space(inode, len, flag); + } __percpu_counter_add(&root->fs_info->delalloc_bytes, -len, root->fs_info->delalloc_batch); @@ -1800,6 +1846,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work) u64 page_start; u64 page_end; int ret; + int flag = TYPE_ROT; fixup = container_of(work, struct btrfs_writepage_fixup, work); page = fixup->page; @@ -1831,7 +1878,7 @@ again: goto again; } - ret = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE); + ret = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE, &flag); if (ret) { mapping_set_error(page->mapping, ret); end_extent_writepage(page, ret, page_start, page_end); @@ -1839,6 +1886,10 @@ again: goto out; } + if (btrfs_test_opt(BTRFS_I(inode)->root, HOT_MOVE)) + set_extent_hot(inode, page_start, page_end, + &cached_state, flag, 0); + btrfs_set_extent_delalloc(inode, page_start, page_end, &cached_state); ClearPageChecked(page); set_page_dirty(page); @@ -4286,20 +4337,21 @@ int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len, struct page *page; gfp_t mask = btrfs_alloc_write_mask(mapping); int ret = 0; + int flag = TYPE_ROT; u64 page_start; u64 page_end; if ((offset & (blocksize - 1)) == 0 && (!len || ((len & (blocksize - 1)) == 0))) goto out; - ret = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE); + ret = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE, &flag); if (ret) goto out; again: page = find_or_create_page(mapping, index, mask); if (!page) { - btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE); + btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE, flag); ret = -ENOMEM; goto out; } @@ -4338,9 +4390,14 @@ again: clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start, page_end, EXTENT_DIRTY | EXTENT_DELALLOC | - EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, + EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG | + EXTENT_HOT | EXTENT_COLD, 0, 0, &cached_state, GFP_NOFS); + if (btrfs_test_opt(root, HOT_MOVE)) + set_extent_hot(inode, page_start, page_end, + &cached_state, flag, 0); + ret = btrfs_set_extent_delalloc(inode, page_start, page_end, &cached_state); if (ret) { @@ -4367,7 +4424,7 @@ again: out_unlock: if (ret) - btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE); + btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE, flag); unlock_page(page); page_cache_release(page); out: @@ -7379,6 +7436,7 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb, struct inode *inode = file->f_mapping->host; size_t count = 0; int flags = 0; + int flag = TYPE_ROT; bool wakeup = true; bool relock = false; ssize_t ret; @@ -7401,7 +7459,7 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb, mutex_unlock(&inode->i_mutex); relock = true; } - ret = btrfs_delalloc_reserve_space(inode, count); + ret = btrfs_delalloc_reserve_space(inode, count, &flag); if (ret) goto out; } else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK, @@ -7417,10 +7475,10 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb, btrfs_submit_direct, flags); if (rw & WRITE) { if (ret < 0 && ret != -EIOCBQUEUED) - btrfs_delalloc_release_space(inode, count); + btrfs_delalloc_release_space(inode, count, flag); else if (ret >= 0 && (size_t)ret < count) btrfs_delalloc_release_space(inode, - count - (size_t)ret); + count - (size_t)ret, flag); else btrfs_delalloc_release_metadata(inode, 0); } @@ -7543,7 +7601,8 @@ static void btrfs_invalidatepage(struct page *page, unsigned long offset) clear_extent_bit(tree, page_start, page_end, EXTENT_DIRTY | EXTENT_DELALLOC | EXTENT_LOCKED | EXTENT_DO_ACCOUNTING | - EXTENT_DEFRAG, 1, 0, &cached_state, GFP_NOFS); + EXTENT_DEFRAG | EXTENT_HOT | EXTENT_COLD, + 1, 0, &cached_state, GFP_NOFS); /* * whoever cleared the private bit is responsible * for the finish_ordered_io @@ -7559,7 +7618,8 @@ static void btrfs_invalidatepage(struct page *page, unsigned long offset) } clear_extent_bit(tree, page_start, page_end, EXTENT_LOCKED | EXTENT_DIRTY | EXTENT_DELALLOC | - EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 1, 1, + EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG | + EXTENT_HOT | EXTENT_COLD, 1, 1, &cached_state, GFP_NOFS); __btrfs_releasepage(page, GFP_NOFS); @@ -7599,11 +7659,12 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) loff_t size; int ret; int reserved = 0; + int flag = TYPE_ROT; u64 page_start; u64 page_end; sb_start_pagefault(inode->i_sb); - ret = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE); + ret = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE, &flag); if (!ret) { ret = file_update_time(vma->vm_file); reserved = 1; @@ -7658,9 +7719,14 @@ again: */ clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start, page_end, EXTENT_DIRTY | EXTENT_DELALLOC | - EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, + EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG | + EXTENT_HOT | EXTENT_COLD, 0, 0, &cached_state, GFP_NOFS); + if (btrfs_test_opt(root, HOT_MOVE)) + set_extent_hot(inode, page_start, page_end, + &cached_state, flag, 0); + ret = btrfs_set_extent_delalloc(inode, page_start, page_end, &cached_state); if (ret) { @@ -7700,7 +7766,7 @@ out_unlock: } unlock_page(page); out: - btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE); + btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE, flag); out_noreserve: sb_end_pagefault(inode->i_sb); return ret; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 0f81d67..9401229 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -56,6 +56,7 @@ #include "rcu-string.h" #include "send.h" #include "dev-replace.h" +#include "hot_relocate.h" /* Mask out flags that are inappropriate for the given type of inode. */ static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags) @@ -1001,6 +1002,7 @@ static int cluster_pages_for_defrag(struct inode *inode, int ret; int i; int i_done; + int flag = TYPE_ROT; struct btrfs_ordered_extent *ordered; struct extent_state *cached_state = NULL; struct extent_io_tree *tree; @@ -1013,7 +1015,7 @@ static int cluster_pages_for_defrag(struct inode *inode, page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1); ret = btrfs_delalloc_reserve_space(inode, - page_cnt << PAGE_CACHE_SHIFT); + page_cnt << PAGE_CACHE_SHIFT, &flag); if (ret) return ret; i_done = 0; @@ -1101,9 +1103,12 @@ again: BTRFS_I(inode)->outstanding_extents++; spin_unlock(&BTRFS_I(inode)->lock); btrfs_delalloc_release_space(inode, - (page_cnt - i_done) << PAGE_CACHE_SHIFT); + (page_cnt - i_done) << PAGE_CACHE_SHIFT, flag); } + if (btrfs_test_opt(BTRFS_I(inode)->root, HOT_MOVE)) + set_extent_hot(inode, page_start, page_end - 1, + &cached_state, flag, 0); set_extent_defrag(&BTRFS_I(inode)->io_tree, page_start, page_end - 1, &cached_state, GFP_NOFS); @@ -1126,7 +1131,8 @@ out: unlock_page(pages[i]); page_cache_release(pages[i]); } - btrfs_delalloc_release_space(inode, page_cnt << PAGE_CACHE_SHIFT); + btrfs_delalloc_release_space(inode, + page_cnt << PAGE_CACHE_SHIFT, flag); return ret; } @@ -3021,8 +3027,9 @@ static long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg) u64 types[] = {BTRFS_BLOCK_GROUP_DATA, BTRFS_BLOCK_GROUP_SYSTEM, BTRFS_BLOCK_GROUP_METADATA, - BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA}; - int num_types = 4; + BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA, + BTRFS_BLOCK_GROUP_DATA_NONROT}; + int num_types = 5; int alloc_size; int ret = 0; u64 slot_count = 0; diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 4febca4..9ea9d6c 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -31,6 +31,7 @@ #include "async-thread.h" #include "free-space-cache.h" #include "inode-map.h" +#include "hot_relocate.h" /* * backref_node, mapping_node and tree_block start with this @@ -2938,12 +2939,13 @@ int prealloc_file_extent_cluster(struct inode *inode, u64 num_bytes; int nr = 0; int ret = 0; + int flag = TYPE_ROT; BUG_ON(cluster->start != cluster->boundary[0]); mutex_lock(&inode->i_mutex); ret = btrfs_check_data_free_space(inode, cluster->end + - 1 - cluster->start); + 1 - cluster->start, &flag); if (ret) goto out; @@ -2965,7 +2967,7 @@ int prealloc_file_extent_cluster(struct inode *inode, nr++; } btrfs_free_reserved_data_space(inode, cluster->end + - 1 - cluster->start); + 1 - cluster->start, flag); out: mutex_unlock(&inode->i_mutex); return ret; diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index f13517b..9ee751f 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -58,6 +58,7 @@ #include "rcu-string.h" #include "dev-replace.h" #include "free-space-cache.h" +#include "hot_relocate.h" #define CREATE_TRACE_POINTS #include @@ -1521,7 +1522,8 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) mutex_lock(&fs_info->chunk_mutex); rcu_read_lock(); list_for_each_entry_rcu(found, head, list) { - if (found->flags & BTRFS_BLOCK_GROUP_DATA) { + if ((found->flags & BTRFS_BLOCK_GROUP_DATA) || + (found->flags & BTRFS_BLOCK_GROUP_DATA_NONROT)) { total_free_data += found->disk_total - found->disk_used; total_free_data -= btrfs_account_ro_block_groups_free_space(found); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8bffb91..8b6beec 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1451,6 +1451,9 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) all_avail = root->fs_info->avail_data_alloc_bits | root->fs_info->avail_system_alloc_bits | root->fs_info->avail_metadata_alloc_bits; + if (btrfs_test_opt(root, HOT_MOVE)) + all_avail |= + root->fs_info->avail_data_nonrot_alloc_bits; } while (read_seqretry(&root->fs_info->profiles_lock, seq)); num_devices = root->fs_info->fs_devices->num_devices; @@ -3728,7 +3731,8 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, devs_increment = btrfs_raid_array[index].devs_increment; ncopies = btrfs_raid_array[index].ncopies; - if (type & BTRFS_BLOCK_GROUP_DATA) { + if (type & BTRFS_BLOCK_GROUP_DATA || + type & BTRFS_BLOCK_GROUP_DATA_NONROT) { max_stripe_size = 1024 * 1024 * 1024; max_chunk_size = 10 * max_stripe_size; } else if (type & BTRFS_BLOCK_GROUP_METADATA) { @@ -3767,9 +3771,30 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, struct btrfs_device *device; u64 max_avail; u64 dev_offset; + int dev_rot; + int skip = 0; device = list_entry(cur, struct btrfs_device, dev_alloc_list); + /* + * If HOT_MOVE is set, the chunk type being allocated + * determines which disks the data may be allocated on. + * This can cause problems if, for example, the data alloc + * profile is RAID0 and there are only two devices, 1 SSD + + * 1 HDD. All allocations to BTRFS_BLOCK_GROUP_DATA_NONROT + * in this config will return -ENOSPC as the allocation code + * can't find allowable space for the second stripe. + */ + dev_rot = !blk_queue_nonrot(bdev_get_queue(device->bdev)); + if (btrfs_test_opt(extent_root, HOT_MOVE)) { + int ret1 = type & (BTRFS_BLOCK_GROUP_DATA | + BTRFS_BLOCK_GROUP_METADATA | + BTRFS_BLOCK_GROUP_SYSTEM) && !dev_rot; + int ret2 = type & BTRFS_BLOCK_GROUP_DATA_NONROT && dev_rot; + if (ret1 || ret2) + skip = 1; + } + cur = cur->next; if (!device->writeable) { @@ -3778,7 +3803,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, continue; } - if (!device->in_fs_metadata || + if (skip || !device->in_fs_metadata || device->is_tgtdev_for_dev_replace) continue;