Message ID | 1351333721-3220-2-git-send-email-bo.li.liu@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, Oct 27, 2012 at 06:28:41PM +0800, Liu Bo wrote: > This feature works on our crucial write endio path, so if we've got > lots of fragments to process, it will be kind of a disaster to the > performance, so I make such a change. > > One can benifit from it while mounting with '-o snap_aware_defrag'. I vote for a more fine grained control over this feature, ie. via 'btrfs fi defrag', off by default (current behaviour). The defrag ioctl is the only place that actually calls set_extent_defrag, so this will not affect normal operation and is fully in hands of the user who runs defrag. Do you have a usecase for setting it through the mount option? thanks, david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/31/2012 07:31 AM, David Sterba wrote: > On Sat, Oct 27, 2012 at 06:28:41PM +0800, Liu Bo wrote: >> This feature works on our crucial write endio path, so if we've got >> lots of fragments to process, it will be kind of a disaster to the >> performance, so I make such a change. >> >> One can benifit from it while mounting with '-o snap_aware_defrag'. > > I vote for a more fine grained control over this feature, ie. via > 'btrfs fi defrag', off by default (current behaviour). The defrag ioctl > is the only place that actually calls set_extent_defrag, so this will > not affect normal operation and is fully in hands of the user who runs > defrag. > Besides 'btrfs fi defrag', mounting with autodefrag may also do the same thing. But controlling by 'btrfs fi defrag' can actually be a good idea. thanks, liubo > Do you have a usecase for setting it through the mount option? > > thanks, > david > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 31, 2012 at 08:34:38AM +0800, Liu Bo wrote:
> Besides 'btrfs fi defrag', mounting with autodefrag may also do the same thing.
Ok, autodefrag, good point. Then I suggest to make the snapshot-aware a
mode of autodefrag, not a separate option (because it would make no
sense other than an alias for "autodefrag=snapshotaware")
david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/31/2012 08:44 AM, David Sterba wrote: > On Wed, Oct 31, 2012 at 08:34:38AM +0800, Liu Bo wrote: >> Besides 'btrfs fi defrag', mounting with autodefrag may also do the same thing. > > Ok, autodefrag, good point. Then I suggest to make the snapshot-aware a > mode of autodefrag, not a separate option (because it would make no > sense other than an alias for "autodefrag=snapshotaware") > Hmm, you might be right. But I have to say 'snapshot-aware defrag' is kind of trade-off. 1. The good case: Say a file is full of fragments and make a snapshot based on file's root fs root, snapshot / | \ / | \ | - - | ... | - - - | ... | - - - | p1 p2 p3 then we do a snapshot-aware defrag, it will be fs root snapshot \ / \ / | - - - - - - - - | a whole new extent We achieve the goal! 2. The bad case: Say we have a file with an whole extent and a snapshot on it at the very first: fs root snapshot \ / \ / | - - - - - - - - | then, we write into part of the file, with COW it will be: | - - || - - - || - - - | ... | - - - | p1 p2 p3 p2_new (file in snapshot -> p1 + p2 + p3) (file in fs root -> p1 + p2_new + p3) then, we do a snapshot-aware defrag, it will be | - - || - - - || - - - | ... | - - - | p1 p2 p3 p2_new || VV | - - | | - - - | ... ... | - - | - - - | - - - | p1 p3 new extent (file in snapshot -> p1 + p3 + middle of new extent) (file in fs root -> new extent) So we're making file in snapshot worse than before, although we get a good one for file in fs root. thanks, liubo > > david > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Oct 27, 2012 at 04:28:41AM -0600, Liu Bo wrote: > This feature works on our crucial write endio path, so if we've got > lots of fragments to process, it will be kind of a disaster to the > performance, so I make such a change. > > One can benifit from it while mounting with '-o snap_aware_defrag'. I think we should always prefer to maintain snapshot cloning as much as possible, and have a specific option to defrag that makes it break the clone in favor of removing fragmentation. So, please keep the snapshot aware defrag the default ;) Thanks for taking these patches up again! -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/01/2012 10:43 PM, Chris Mason wrote: > On Sat, Oct 27, 2012 at 04:28:41AM -0600, Liu Bo wrote: >> This feature works on our crucial write endio path, so if we've got >> lots of fragments to process, it will be kind of a disaster to the >> performance, so I make such a change. >> >> One can benifit from it while mounting with '-o snap_aware_defrag'. > > I think we should always prefer to maintain snapshot cloning as much as > possible, and have a specific option to defrag that makes it break the > clone in favor of removing fragmentation. > Oh yeah, so I was considering the existing btrfs partitions who have already broke the cloning relationship. > So, please keep the snapshot aware defrag the default ;) > All right, that'd be nice, just drop this patch. thanks, liubo > Thanks for taking these patches up again! > > -chris > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 926c9ff..f9cd9c9 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1756,6 +1756,7 @@ struct btrfs_ioctl_defrag_range_args { #define BTRFS_MOUNT_CHECK_INTEGRITY (1 << 20) #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1 << 21) #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR (1 << 22) +#define BTRFS_MOUNT_SA_DEFRAG (1 << 23) #define btrfs_clear_opt(o, opt) ((o) &= ~BTRFS_MOUNT_##opt) #define btrfs_set_opt(o, opt) ((o) |= BTRFS_MOUNT_##opt) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 35e6993..069499e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2488,13 +2488,17 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) ordered_extent->file_offset + ordered_extent->len - 1, 0, &cached_state); - ret = test_range_bit(io_tree, ordered_extent->file_offset, - ordered_extent->file_offset + ordered_extent->len - 1, - EXTENT_DEFRAG, 1, cached_state); - if (ret && btrfs_root_last_snapshot(&root->root_item) >= + if (btrfs_test_opt(root, SA_DEFRAG)) { + ret = test_range_bit(io_tree, ordered_extent->file_offset, + ordered_extent->file_offset + + ordered_extent->len - 1, + EXTENT_DEFRAG, 1, cached_state); + if (ret && + btrfs_root_last_snapshot(&root->root_item) >= BTRFS_I(inode)->generation) { - /* the inode is shared */ - new = record_old_file_extents(inode, ordered_extent); + /* the inode is shared */ + new = record_old_file_extents(inode, ordered_extent); + } } if (nolock) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 6116880..1367165 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1058,8 +1058,9 @@ again: } - set_extent_defrag(&BTRFS_I(inode)->io_tree, page_start, page_end - 1, - &cached_state, GFP_NOFS); + if (btrfs_test_opt(BTRFS_I(inode)->root, SA_DEFRAG)) + set_extent_defrag(&BTRFS_I(inode)->io_tree, page_start, + page_end - 1, &cached_state, GFP_NOFS); unlock_extent_cached(&BTRFS_I(inode)->io_tree, page_start, page_end - 1, &cached_state, diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 915ac14..24eac5f 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -308,8 +308,8 @@ enum { Opt_compress_type, Opt_compress_force, Opt_compress_force_type, Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard, Opt_space_cache, Opt_clear_cache, Opt_user_subvol_rm_allowed, - Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache, - Opt_no_space_cache, Opt_recovery, Opt_skip_balance, + Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_sa_defrag, + Opt_inode_cache, Opt_no_space_cache, Opt_recovery, Opt_skip_balance, Opt_check_integrity, Opt_check_integrity_including_extent_data, Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_err, @@ -344,6 +344,7 @@ static match_table_t tokens = { {Opt_enospc_debug, "enospc_debug"}, {Opt_subvolrootid, "subvolrootid=%d"}, {Opt_defrag, "autodefrag"}, + {Opt_sa_defrag, "snap_aware_defrag"}, {Opt_inode_cache, "inode_cache"}, {Opt_no_space_cache, "nospace_cache"}, {Opt_recovery, "recovery"}, @@ -564,6 +565,11 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) printk(KERN_INFO "btrfs: enabling auto defrag\n"); btrfs_set_opt(info->mount_opt, AUTO_DEFRAG); break; + case Opt_sa_defrag: + printk(KERN_INFO "btrfs: enabling snapshot-aware" + " defrag\n"); + btrfs_set_opt(info->mount_opt, SA_DEFRAG); + break; case Opt_recovery: printk(KERN_INFO "btrfs: enabling auto recovery\n"); btrfs_set_opt(info->mount_opt, RECOVERY); @@ -935,6 +941,8 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry) seq_puts(seq, ",enospc_debug"); if (btrfs_test_opt(root, AUTO_DEFRAG)) seq_puts(seq, ",autodefrag"); + if (btrfs_test_opt(root, SA_DEFRAG)) + seq_puts(seq, ",snap_aware_defrag"); if (btrfs_test_opt(root, INODE_MAP_CACHE)) seq_puts(seq, ",inode_cache"); if (btrfs_test_opt(root, SKIP_BALANCE))
This feature works on our crucial write endio path, so if we've got lots of fragments to process, it will be kind of a disaster to the performance, so I make such a change. One can benifit from it while mounting with '-o snap_aware_defrag'. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> --- fs/btrfs/ctree.h | 1 + fs/btrfs/inode.c | 16 ++++++++++------ fs/btrfs/ioctl.c | 5 +++-- fs/btrfs/super.c | 12 ++++++++++-- 4 files changed, 24 insertions(+), 10 deletions(-)