diff mbox

[RFC] btrfs/ioctl.c: Prefer inode with lowest offset as source for clone

Message ID CAGqmi76ZKv3ezguTyxGL_Jv_xtq5w_SGPvjwTy90Z=oNY73Vmw@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Timofey Titovets Oct. 20, 2015, 1:29 p.m. UTC
For performance reason, leave data at the start of disk, is preferable
while deduping
It's might sense for the reasons:
1. Spinning rust - start of the disk is much faster
2. Btrfs can deallocate empty data chunk from the end of fs - ie it's compact fs

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/ioctl.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

  unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,

Comments

Filipe Manana Oct. 20, 2015, 2:56 p.m. UTC | #1
On Tue, Oct 20, 2015 at 2:29 PM, Timofey Titovets <nefelim4ag@gmail.com> wrote:
> For performance reason, leave data at the start of disk, is preferable
> while deduping

Have you made any performance tests to verify that?

> It's might sense for the reasons:
> 1. Spinning rust - start of the disk is much faster
> 2. Btrfs can deallocate empty data chunk from the end of fs - ie it's compact fs

I don't see why that makes sense. First the clone/extent_same ioctls
don't copy data, they just update metadata of the destination inode to
point to the same extents as the source inode, secondly, just because
an offset of a file is lower the offset of the other file, it doesn't
mean the physical (on disk) offset of the first file is lower than
that of the other file...

>
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> ---
>  fs/btrfs/ioctl.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 3e3e613..3eb77c0 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3074,8 +3074,13 @@ static int btrfs_extent_same(struct inode *src,
> u64 loff, u64 olen,
>
>   /* pass original length for comparison so we stay within i_size */
>   ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
> - if (ret == 0)
> -     ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> + if (ret == 0) {
> +     /* prefer inode with lowest offset as source for clone*/
> +     if (loff > dest_loff)
> +         ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, 1);
> +     else
> +         ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> + }
>
>   if (same_inode)
>   unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> --
> 2.6.1
Timofey Titovets Oct. 20, 2015, 4:19 p.m. UTC | #2
2015-10-20 17:56 GMT+03:00, Filipe Manana <fdmanana@gmail.com>:
> On Tue, Oct 20, 2015 at 2:29 PM, Timofey Titovets <nefelim4ag@gmail.com>
> wrote:
>> For performance reason, leave data at the start of disk, is preferable
>> while deduping
>
> Have you made any performance tests to verify that?


pe, i don't run any performance test, at now

It's like defragmentation, can give a boost in specific cases, and if
assumption, what, beginning sectors of hdd, is faster
(this is true, because count of sectors is not equal, at the beginning
and end of hdd space)

And while, i not shall sure in my code, it's useless

>> It's might sense for the reasons:
>> 1. Spinning rust - start of the disk is much faster
>> 2. Btrfs can deallocate empty data chunk from the end of fs - ie it's
>> compact fs
>
> I don't see why that makes sense. First the clone/extent_same ioctls
> don't copy data, they just update metadata of the destination inode to
> point to the same extents as the source inode,

s true, but as i say before, data at the beggining, of hdd, can be
accessed faster, then data of the end of hdd
I.e. it's give a boost, after dedup, not while processing requests

>secondly, just because
> an offset of a file is lower the offset of the other file, it doesn't
> mean the physical (on disk) offset of the first file is lower than
> that of the other file...

Oh, so sad, then i must go deeper in the code -.-

>>
>> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
>> ---
>>  fs/btrfs/ioctl.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 3e3e613..3eb77c0 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -3074,8 +3074,13 @@ static int btrfs_extent_same(struct inode *src,
>> u64 loff, u64 olen,
>>
>>   /* pass original length for comparison so we stay within i_size */
>>   ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
>> - if (ret == 0)
>> -     ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
>> + if (ret == 0) {
>> +     /* prefer inode with lowest offset as source for clone*/
>> +     if (loff > dest_loff)
>> +         ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, 1);
>> +     else
>> +         ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
>> + }
>>
>>   if (same_inode)
>>   unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
>> --
>> 2.6.1
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
>
Zygo Blaxell Oct. 23, 2015, 12:14 a.m. UTC | #3
On Tue, Oct 20, 2015 at 04:29:46PM +0300, Timofey Titovets wrote:
> For performance reason, leave data at the start of disk, is preferable
> while deduping
> It's might sense for the reasons:
> 1. Spinning rust - start of the disk is much faster
> 2. Btrfs can deallocate empty data chunk from the end of fs - ie it's compact fs

"src" is the extent that is kept, and "dst" is the extent that is
discarded.  When both extents are shared, the dedup userspace has to
pass a common "src" with many different "dst" over several extent-same
calls in order to get rid of all of the references to the "dst" extent.

If "src" and "dst" are arbitrarily swapped over multiple extent-same calls
then it becomes impossible to dedup shared extents.  Heck, if there are
more than two extents even in one extent-same call then it stops working.

It would be possible to have dedup figure out which extent the kernel
picked after the fact, but that's totally unnecessary extra work in
cases where the userspace has a good reason to pick the extents it did
(e.g. administrator hints about future usage of the files where the
extents were found).

Dedup userspace can figure out the physical addresses of the extents
and rearrange the arguments itself if desired.

> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> ---
>  fs/btrfs/ioctl.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 3e3e613..3eb77c0 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3074,8 +3074,13 @@ static int btrfs_extent_same(struct inode *src,
> u64 loff, u64 olen,
> 
>   /* pass original length for comparison so we stay within i_size */
>   ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
> - if (ret == 0)
> -     ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> + if (ret == 0) {
> +     /* prefer inode with lowest offset as source for clone*/
> +     if (loff > dest_loff)
> +         ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, 1);
> +     else
> +         ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> + }
> 
>   if (same_inode)
>   unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> -- 
> 2.6.1

> From 5ed3822bc308c726d91a837fbd97ebacaa51e58d Mon Sep 17 00:00:00 2001
> From: Timofey Titovets <nefelim4ag@gmail.com>
> Date: Tue, 20 Oct 2015 15:53:20 +0300
> Subject: [RFC PATCH] btrfs/ioctl.c: Prefer inode with lowest offset as source for
>  clone
> 
> For performance reason, leave data at the start of disk, is preferable
> 
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> ---
>  fs/btrfs/ioctl.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 3e3e613..3eb77c0 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3074,8 +3074,13 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
>  
>  	/* pass original length for comparison so we stay within i_size */
>  	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
> -	if (ret == 0)
> -		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> +	if (ret == 0) {
> +		/* prefer inode with lowest offset as source for clone*/
> +		if (loff > dest_loff)
> +			ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, 1);
> +		else
> +			ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> +	}
>  
>  	if (same_inode)
>  		unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> -- 
> 2.6.1
>
Timofey Titovets Oct. 23, 2015, 11:51 a.m. UTC | #4
Zygo, you are right
Thread closed, thanks

2015-10-23 3:14 GMT+03:00 Zygo Blaxell <ce3g8jdj@umail.furryterror.org>:
> On Tue, Oct 20, 2015 at 04:29:46PM +0300, Timofey Titovets wrote:
>> For performance reason, leave data at the start of disk, is preferable
>> while deduping
>> It's might sense for the reasons:
>> 1. Spinning rust - start of the disk is much faster
>> 2. Btrfs can deallocate empty data chunk from the end of fs - ie it's compact fs
>
> "src" is the extent that is kept, and "dst" is the extent that is
> discarded.  When both extents are shared, the dedup userspace has to
> pass a common "src" with many different "dst" over several extent-same
> calls in order to get rid of all of the references to the "dst" extent.
>
> If "src" and "dst" are arbitrarily swapped over multiple extent-same calls
> then it becomes impossible to dedup shared extents.  Heck, if there are
> more than two extents even in one extent-same call then it stops working.
>
> It would be possible to have dedup figure out which extent the kernel
> picked after the fact, but that's totally unnecessary extra work in
> cases where the userspace has a good reason to pick the extents it did
> (e.g. administrator hints about future usage of the files where the
> extents were found).
>
> Dedup userspace can figure out the physical addresses of the extents
> and rearrange the arguments itself if desired.
>
>> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
>> ---
>>  fs/btrfs/ioctl.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 3e3e613..3eb77c0 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -3074,8 +3074,13 @@ static int btrfs_extent_same(struct inode *src,
>> u64 loff, u64 olen,
>>
>>   /* pass original length for comparison so we stay within i_size */
>>   ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
>> - if (ret == 0)
>> -     ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
>> + if (ret == 0) {
>> +     /* prefer inode with lowest offset as source for clone*/
>> +     if (loff > dest_loff)
>> +         ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, 1);
>> +     else
>> +         ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
>> + }
>>
>>   if (same_inode)
>>   unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
>> --
>> 2.6.1
>
>> From 5ed3822bc308c726d91a837fbd97ebacaa51e58d Mon Sep 17 00:00:00 2001
>> From: Timofey Titovets <nefelim4ag@gmail.com>
>> Date: Tue, 20 Oct 2015 15:53:20 +0300
>> Subject: [RFC PATCH] btrfs/ioctl.c: Prefer inode with lowest offset as source for
>>  clone
>>
>> For performance reason, leave data at the start of disk, is preferable
>>
>> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
>> ---
>>  fs/btrfs/ioctl.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 3e3e613..3eb77c0 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -3074,8 +3074,13 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
>>
>>       /* pass original length for comparison so we stay within i_size */
>>       ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
>> -     if (ret == 0)
>> -             ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
>> +     if (ret == 0) {
>> +             /* prefer inode with lowest offset as source for clone*/
>> +             if (loff > dest_loff)
>> +                     ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, 1);
>> +             else
>> +                     ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
>> +     }
>>
>>       if (same_inode)
>>               unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
>> --
>> 2.6.1
>>
>
diff mbox

Patch

From 5ed3822bc308c726d91a837fbd97ebacaa51e58d Mon Sep 17 00:00:00 2001
From: Timofey Titovets <nefelim4ag@gmail.com>
Date: Tue, 20 Oct 2015 15:53:20 +0300
Subject: [RFC PATCH] btrfs/ioctl.c: Prefer inode with lowest offset as source for
 clone

For performance reason, leave data at the start of disk, is preferable

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/ioctl.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 3e3e613..3eb77c0 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3074,8 +3074,13 @@  static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 
 	/* pass original length for comparison so we stay within i_size */
 	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
-	if (ret == 0)
-		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
+	if (ret == 0) {
+		/* prefer inode with lowest offset as source for clone*/
+		if (loff > dest_loff)
+			ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, 1);
+		else
+			ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
+	}
 
 	if (same_inode)
 		unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
-- 
2.6.1