diff mbox

[2/2] Btrfs: make snapshot-aware defrag as a mount option

Message ID 1351333721-3220-2-git-send-email-bo.li.liu@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Liu Bo Oct. 27, 2012, 10:28 a.m. UTC
This feature works on our crucial write endio path, so if we've got
lots of fragments to process, it will be kind of a disaster to the
performance, so I make such a change.

One can benifit from it while mounting with '-o snap_aware_defrag'.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/ctree.h |    1 +
 fs/btrfs/inode.c |   16 ++++++++++------
 fs/btrfs/ioctl.c |    5 +++--
 fs/btrfs/super.c |   12 ++++++++++--
 4 files changed, 24 insertions(+), 10 deletions(-)

Comments

David Sterba Oct. 30, 2012, 11:31 p.m. UTC | #1
On Sat, Oct 27, 2012 at 06:28:41PM +0800, Liu Bo wrote:
> This feature works on our crucial write endio path, so if we've got
> lots of fragments to process, it will be kind of a disaster to the
> performance, so I make such a change.
> 
> One can benifit from it while mounting with '-o snap_aware_defrag'.

I vote for a more fine grained control over this feature, ie.  via
'btrfs fi defrag', off by default (current behaviour). The defrag ioctl
is the only place that actually calls set_extent_defrag, so this will
not affect normal operation and is fully in hands of the user who runs
defrag.

Do you have a usecase for setting it through the mount option?

thanks,
david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Oct. 31, 2012, 12:34 a.m. UTC | #2
On 10/31/2012 07:31 AM, David Sterba wrote:
> On Sat, Oct 27, 2012 at 06:28:41PM +0800, Liu Bo wrote:
>> This feature works on our crucial write endio path, so if we've got
>> lots of fragments to process, it will be kind of a disaster to the
>> performance, so I make such a change.
>>
>> One can benifit from it while mounting with '-o snap_aware_defrag'.
> 
> I vote for a more fine grained control over this feature, ie.  via
> 'btrfs fi defrag', off by default (current behaviour). The defrag ioctl
> is the only place that actually calls set_extent_defrag, so this will
> not affect normal operation and is fully in hands of the user who runs
> defrag.
> 

Besides 'btrfs fi defrag', mounting with autodefrag may also do the same thing.

But controlling by 'btrfs fi defrag' can actually be a good idea.

thanks,
liubo

> Do you have a usecase for setting it through the mount option?
> 
> thanks,
> david
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Oct. 31, 2012, 12:44 a.m. UTC | #3
On Wed, Oct 31, 2012 at 08:34:38AM +0800, Liu Bo wrote:
> Besides 'btrfs fi defrag', mounting with autodefrag may also do the same thing.

Ok, autodefrag, good point. Then I suggest to make the snapshot-aware a
mode of autodefrag, not a separate option (because it would make no
sense other than an alias for "autodefrag=snapshotaware")


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Oct. 31, 2012, 1:31 p.m. UTC | #4
On 10/31/2012 08:44 AM, David Sterba wrote:
> On Wed, Oct 31, 2012 at 08:34:38AM +0800, Liu Bo wrote:
>> Besides 'btrfs fi defrag', mounting with autodefrag may also do the same thing.
> 
> Ok, autodefrag, good point. Then I suggest to make the snapshot-aware a
> mode of autodefrag, not a separate option (because it would make no
> sense other than an alias for "autodefrag=snapshotaware")
> 

Hmm, you might be right.

But I have to say 'snapshot-aware defrag' is kind of trade-off.

1. The good case:
Say a file is full of fragments and make a snapshot based on file's root

        fs root, snapshot
       /        |        \
      /         |         \
| - - | ... | - - - | ... | - - - |
   p1           p2            p3

then we do a snapshot-aware defrag, it will be

fs root      snapshot
   \          /
    \        /
| - - - - - - - - |
 a whole new extent

We achieve the goal!


2. The bad case:
Say we have a file with an whole extent and a snapshot on it at the very first:

fs root      snapshot
   \          /
    \        /
| - - - - - - - - |

then, we write into part of the file, with COW it will be:

         
| - - || - - - || - - - |  ...  | - - - |
   p1      p2       p3            p2_new

(file in snapshot -> p1 + p2 + p3)
(file in fs root -> p1 + p2_new + p3)

then, we do a snapshot-aware defrag, it will be

| - - || - - - || - - - |  ...  | - - - |
   p1      p2      p3            p2_new

                  ||
                  VV

| - - |           | - - - |  ...            ... | - - | - - - | - - - |
   p1                 p3                             new extent

(file in snapshot -> p1 + p3 + middle of new extent)
(file in fs root -> new extent)

So we're making file in snapshot worse than before, although we get a good one for file in fs root.


thanks,
liubo

> 
> david
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason Nov. 1, 2012, 2:43 p.m. UTC | #5
On Sat, Oct 27, 2012 at 04:28:41AM -0600, Liu Bo wrote:
> This feature works on our crucial write endio path, so if we've got
> lots of fragments to process, it will be kind of a disaster to the
> performance, so I make such a change.
> 
> One can benifit from it while mounting with '-o snap_aware_defrag'.

I think we should always prefer to maintain snapshot cloning as much as
possible, and have a specific option to defrag that makes it break the
clone in favor of removing fragmentation.

So, please keep the snapshot aware defrag the default ;)

Thanks for taking these patches up again!

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Nov. 1, 2012, 3:49 p.m. UTC | #6
On 11/01/2012 10:43 PM, Chris Mason wrote:
> On Sat, Oct 27, 2012 at 04:28:41AM -0600, Liu Bo wrote:
>> This feature works on our crucial write endio path, so if we've got
>> lots of fragments to process, it will be kind of a disaster to the
>> performance, so I make such a change.
>>
>> One can benifit from it while mounting with '-o snap_aware_defrag'.
> 
> I think we should always prefer to maintain snapshot cloning as much as
> possible, and have a specific option to defrag that makes it break the
> clone in favor of removing fragmentation.
> 

Oh yeah, so I was considering the existing btrfs partitions who have already
broke the cloning relationship.

> So, please keep the snapshot aware defrag the default ;)
> 

All right, that'd be nice, just drop this patch.

thanks,
liubo

> Thanks for taking these patches up again!
> 
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 926c9ff..f9cd9c9 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1756,6 +1756,7 @@  struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_MOUNT_CHECK_INTEGRITY	(1 << 20)
 #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1 << 21)
 #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR	(1 << 22)
+#define BTRFS_MOUNT_SA_DEFRAG		(1 << 23)
 
 #define btrfs_clear_opt(o, opt)		((o) &= ~BTRFS_MOUNT_##opt)
 #define btrfs_set_opt(o, opt)		((o) |= BTRFS_MOUNT_##opt)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 35e6993..069499e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2488,13 +2488,17 @@  static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 			 ordered_extent->file_offset + ordered_extent->len - 1,
 			 0, &cached_state);
 
-	ret = test_range_bit(io_tree, ordered_extent->file_offset,
-			ordered_extent->file_offset + ordered_extent->len - 1,
-			EXTENT_DEFRAG, 1, cached_state);
-	if (ret && btrfs_root_last_snapshot(&root->root_item) >=
+	if (btrfs_test_opt(root, SA_DEFRAG)) {
+		ret = test_range_bit(io_tree, ordered_extent->file_offset,
+				     ordered_extent->file_offset +
+				     ordered_extent->len - 1,
+				     EXTENT_DEFRAG, 1, cached_state);
+		if (ret &&
+		    btrfs_root_last_snapshot(&root->root_item) >=
 						BTRFS_I(inode)->generation) {
-		/* the inode is shared */
-		new = record_old_file_extents(inode, ordered_extent);
+			/* the inode is shared */
+			new = record_old_file_extents(inode, ordered_extent);
+		}
 	}
 
 	if (nolock)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 6116880..1367165 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1058,8 +1058,9 @@  again:
 	}
 
 
-	set_extent_defrag(&BTRFS_I(inode)->io_tree, page_start, page_end - 1,
-			  &cached_state, GFP_NOFS);
+	if (btrfs_test_opt(BTRFS_I(inode)->root, SA_DEFRAG))
+		set_extent_defrag(&BTRFS_I(inode)->io_tree, page_start,
+				  page_end - 1, &cached_state, GFP_NOFS);
 
 	unlock_extent_cached(&BTRFS_I(inode)->io_tree,
 			     page_start, page_end - 1, &cached_state,
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 915ac14..24eac5f 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -308,8 +308,8 @@  enum {
 	Opt_compress_type, Opt_compress_force, Opt_compress_force_type,
 	Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard,
 	Opt_space_cache, Opt_clear_cache, Opt_user_subvol_rm_allowed,
-	Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache,
-	Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
+	Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_sa_defrag,
+	Opt_inode_cache, Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
 	Opt_check_integrity, Opt_check_integrity_including_extent_data,
 	Opt_check_integrity_print_mask, Opt_fatal_errors,
 	Opt_err,
@@ -344,6 +344,7 @@  static match_table_t tokens = {
 	{Opt_enospc_debug, "enospc_debug"},
 	{Opt_subvolrootid, "subvolrootid=%d"},
 	{Opt_defrag, "autodefrag"},
+	{Opt_sa_defrag, "snap_aware_defrag"},
 	{Opt_inode_cache, "inode_cache"},
 	{Opt_no_space_cache, "nospace_cache"},
 	{Opt_recovery, "recovery"},
@@ -564,6 +565,11 @@  int btrfs_parse_options(struct btrfs_root *root, char *options)
 			printk(KERN_INFO "btrfs: enabling auto defrag\n");
 			btrfs_set_opt(info->mount_opt, AUTO_DEFRAG);
 			break;
+		case Opt_sa_defrag:
+			printk(KERN_INFO "btrfs: enabling snapshot-aware"
+			       " defrag\n");
+			btrfs_set_opt(info->mount_opt, SA_DEFRAG);
+			break;
 		case Opt_recovery:
 			printk(KERN_INFO "btrfs: enabling auto recovery\n");
 			btrfs_set_opt(info->mount_opt, RECOVERY);
@@ -935,6 +941,8 @@  static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
 		seq_puts(seq, ",enospc_debug");
 	if (btrfs_test_opt(root, AUTO_DEFRAG))
 		seq_puts(seq, ",autodefrag");
+	if (btrfs_test_opt(root, SA_DEFRAG))
+		seq_puts(seq, ",snap_aware_defrag");
 	if (btrfs_test_opt(root, INODE_MAP_CACHE))
 		seq_puts(seq, ",inode_cache");
 	if (btrfs_test_opt(root, SKIP_BALANCE))