diff mbox series

Btrfs: send, improve clone range

Message ID 1553766036-20689-1-git-send-email-robbieko@synology.com (mailing list archive)
State New, archived
Headers show
Series Btrfs: send, improve clone range | expand

Commit Message

robbieko March 28, 2019, 9:40 a.m. UTC
From: Robbie Ko <robbieko@synology.com>

Improve clone_range two use scenarios.

1. Remove the limit of clone inode size
We can do partial clone range, so there is no need to limit
the inode size.

2. In the scenarios of rewrite or clone_range, data_offset
rarely matches exactly, so the chance of a clone is missed.

e.g.
    1. Write a 1M file
        dd if=/dev/zero of=1M bs=1M count=1

    2. Clone 1M file
       cp --reflink 1M clone

    3. Rewrite 4k on the clone file
       dd if=/dev/zero of=clone bs=4k count=1 conv=notrunc

    The disk layout is as follows:
    item 16 key (257 EXTENT_DATA 0) itemoff 15353 itemsize 53
	extent data disk byte 1103101952 nr 1048576
	extent data offset 0 nr 1048576 ram 1048576
	extent compression(none)
    ...
    item 22 key (258 EXTENT_DATA 0) itemoff 14959 itemsize 53
	extent data disk byte 1104150528 nr 4096
	extent data offset 0 nr 4096 ram 4096
	extent compression(none)
    item 23 key (258 EXTENT_DATA 4096) itemoff 14906 itemsize 53
	extent data disk byte 1103101952 nr 1048576
	extent data offset 4096 nr 1044480 ram 1048576
	extent compression(none)

When send, inode 258 file offset 4096~1048576 (item 23) has a
chance to clone_range, but because data_offset does not match
inode 257 (item 16), it causes missed clone and can only transfer
actual data.

Improve the problem by judging whether the current data_offset
has overlap with the file extent item, and if so, adjusting
offset and extent_len so that we can clone correctly.

Signed-off-by: Robbie Ko <robbieko@synology.com>
---
 fs/btrfs/send.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

Comments

Filipe Manana March 28, 2019, 10:52 p.m. UTC | #1
On Thu, Mar 28, 2019 at 9:41 AM robbieko <robbieko@synology.com> wrote:
>
> From: Robbie Ko <robbieko@synology.com>
>
> Improve clone_range two use scenarios.
>
> 1. Remove the limit of clone inode size
> We can do partial clone range, so there is no need to limit
> the inode size.

There is.
Cloning from a source range that goes beyond the source's i_size
results in an -EINVAL error returned from the clone ioctl.

Try running fstests btrfs/007, with a seed value of 1553175244 and
2000 operations instead of 200:

# /home/fdmanana/git/hub/btrfs-progs/btrfs receive
/home/fdmanana/btrfs-tests/scratch_1
ERROR: failed to clone extents to p0/df/d10/f2c: Invalid argument
At snapshot incr
failed: '/home/fdmanana/git/hub/btrfs-progs/btrfs receive
/home/fdmanana/btrfs-tests/scratch_1'


>
> 2. In the scenarios of rewrite or clone_range, data_offset
> rarely matches exactly, so the chance of a clone is missed.
>
> e.g.
>     1. Write a 1M file
>         dd if=/dev/zero of=1M bs=1M count=1
>
>     2. Clone 1M file
>        cp --reflink 1M clone
>
>     3. Rewrite 4k on the clone file
>        dd if=/dev/zero of=clone bs=4k count=1 conv=notrunc
>
>     The disk layout is as follows:
>     item 16 key (257 EXTENT_DATA 0) itemoff 15353 itemsize 53
>         extent data disk byte 1103101952 nr 1048576
>         extent data offset 0 nr 1048576 ram 1048576
>         extent compression(none)
>     ...
>     item 22 key (258 EXTENT_DATA 0) itemoff 14959 itemsize 53
>         extent data disk byte 1104150528 nr 4096
>         extent data offset 0 nr 4096 ram 4096
>         extent compression(none)
>     item 23 key (258 EXTENT_DATA 4096) itemoff 14906 itemsize 53
>         extent data disk byte 1103101952 nr 1048576
>         extent data offset 4096 nr 1044480 ram 1048576
>         extent compression(none)
>
> When send, inode 258 file offset 4096~1048576 (item 23) has a
> chance to clone_range, but because data_offset does not match
> inode 257 (item 16), it causes missed clone and can only transfer
> actual data.
>
> Improve the problem by judging whether the current data_offset
> has overlap with the file extent item, and if so, adjusting
> offset and extent_len so that we can clone correctly.
>
> Signed-off-by: Robbie Ko <robbieko@synology.com>
> ---
>  fs/btrfs/send.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 7ea2d6b..7766b12 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -1240,9 +1240,6 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 root, void *ctx_)
>         if (ret < 0)
>                 return ret;
>
> -       if (offset + bctx->data_offset + bctx->extent_len > i_size)
> -               return 0;

And this is why the failure above happens.

> -
>         /*
>          * Make sure we don't consider clones from send_root that are
>          * behind the current inode/offset.
> @@ -5148,6 +5145,7 @@ static int clone_range(struct send_ctx *sctx,
>                 u8 type;
>                 u64 ext_len;
>                 u64 clone_len;
> +               u64 clone_data_offset;

  CC [M]  fs/btrfs/send.o
fs/btrfs/send.c: In function 'process_extent':
fs/btrfs/send.c:5218:60: warning: 'clone_data_offset' may be used
uninitialized in this function [-Wmaybe-uninitialized]
   if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte &&
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
       clone_data_offset == data_offset)
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/send.c:5148:7: note: 'clone_data_offset' was declared here
   u64 clone_data_offset;
       ^~~~~~~~~~~~~~~~~
  LD [M]  fs/btrfs/btrfs.o

$ gcc --version
gcc --version
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.

Harmless but we shouldn't have warnings, initialize it to something
impossible (u64(-1) / U64_MAX) or re-structure the change below.

>
>                 if (slot >= btrfs_header_nritems(leaf)) {
>                         ret = btrfs_next_leaf(clone_root->root, path);
> @@ -5201,10 +5199,24 @@ static int clone_range(struct send_ctx *sctx,
>                 if (key.offset >= clone_root->offset + len)
>                         break;
>
> +               if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte) {
> +                       clone_root->offset = key.offset;
> +                       clone_data_offset = btrfs_file_extent_offset(leaf, ei);
> +                       if (clone_data_offset < data_offset &&
> +                               clone_data_offset + ext_len > data_offset) {
> +                               u64 extent_offset;
> +
> +                               extent_offset = data_offset - clone_data_offset;
> +                               ext_len -= extent_offset;
> +                               clone_data_offset += extent_offset;
> +                               clone_root->offset += extent_offset;
> +                       }
> +               }
> +
>                 clone_len = min_t(u64, ext_len, len);
>
>                 if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte &&
> -                   btrfs_file_extent_offset(leaf, ei) == data_offset)
> +                   clone_data_offset == data_offset)
>                         ret = send_clone(sctx, offset, clone_len, clone_root);
>                 else
>                         ret = send_extent_data(sctx, offset, clone_len);
> --
> 1.9.1
>
robbieko March 29, 2019, 8:26 a.m. UTC | #2
Filipe Manana 於 2019-03-29 06:52 寫到:
> On Thu, Mar 28, 2019 at 9:41 AM robbieko <robbieko@synology.com> wrote:
>> 
>> From: Robbie Ko <robbieko@synology.com>
>> 
>> Improve clone_range two use scenarios.
>> 
>> 1. Remove the limit of clone inode size
>> We can do partial clone range, so there is no need to limit
>> the inode size.
> 
> There is.
> Cloning from a source range that goes beyond the source's i_size
> results in an -EINVAL error returned from the clone ioctl.
> 
> Try running fstests btrfs/007, with a seed value of 1553175244 and
> 2000 operations instead of 200:
> 
> # /home/fdmanana/git/hub/btrfs-progs/btrfs receive
> /home/fdmanana/btrfs-tests/scratch_1
> ERROR: failed to clone extents to p0/df/d10/f2c: Invalid argument
> At snapshot incr
> failed: '/home/fdmanana/git/hub/btrfs-progs/btrfs receive
> /home/fdmanana/btrfs-tests/scratch_1'
> 

This is my fault, fallocate keeps size, which can result in file extent 
range > inode size.
So we still need to check if the file range offset + len is greater than 
the inode size when cloning

Thanks.
I will fix and send Patch v2.

> 
>> 
>> 2. In the scenarios of rewrite or clone_range, data_offset
>> rarely matches exactly, so the chance of a clone is missed.
>> 
>> e.g.
>>     1. Write a 1M file
>>         dd if=/dev/zero of=1M bs=1M count=1
>> 
>>     2. Clone 1M file
>>        cp --reflink 1M clone
>> 
>>     3. Rewrite 4k on the clone file
>>        dd if=/dev/zero of=clone bs=4k count=1 conv=notrunc
>> 
>>     The disk layout is as follows:
>>     item 16 key (257 EXTENT_DATA 0) itemoff 15353 itemsize 53
>>         extent data disk byte 1103101952 nr 1048576
>>         extent data offset 0 nr 1048576 ram 1048576
>>         extent compression(none)
>>     ...
>>     item 22 key (258 EXTENT_DATA 0) itemoff 14959 itemsize 53
>>         extent data disk byte 1104150528 nr 4096
>>         extent data offset 0 nr 4096 ram 4096
>>         extent compression(none)
>>     item 23 key (258 EXTENT_DATA 4096) itemoff 14906 itemsize 53
>>         extent data disk byte 1103101952 nr 1048576
>>         extent data offset 4096 nr 1044480 ram 1048576
>>         extent compression(none)
>> 
>> When send, inode 258 file offset 4096~1048576 (item 23) has a
>> chance to clone_range, but because data_offset does not match
>> inode 257 (item 16), it causes missed clone and can only transfer
>> actual data.
>> 
>> Improve the problem by judging whether the current data_offset
>> has overlap with the file extent item, and if so, adjusting
>> offset and extent_len so that we can clone correctly.
>> 
>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>> ---
>>  fs/btrfs/send.c | 20 ++++++++++++++++----
>>  1 file changed, 16 insertions(+), 4 deletions(-)
>> 
>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>> index 7ea2d6b..7766b12 100644
>> --- a/fs/btrfs/send.c
>> +++ b/fs/btrfs/send.c
>> @@ -1240,9 +1240,6 @@ static int __iterate_backrefs(u64 ino, u64 
>> offset, u64 root, void *ctx_)
>>         if (ret < 0)
>>                 return ret;
>> 
>> -       if (offset + bctx->data_offset + bctx->extent_len > i_size)
>> -               return 0;
> 
> And this is why the failure above happens.
> 
>> -
>>         /*
>>          * Make sure we don't consider clones from send_root that are
>>          * behind the current inode/offset.
>> @@ -5148,6 +5145,7 @@ static int clone_range(struct send_ctx *sctx,
>>                 u8 type;
>>                 u64 ext_len;
>>                 u64 clone_len;
>> +               u64 clone_data_offset;
> 
>   CC [M]  fs/btrfs/send.o
> fs/btrfs/send.c: In function 'process_extent':
> fs/btrfs/send.c:5218:60: warning: 'clone_data_offset' may be used
> uninitialized in this function [-Wmaybe-uninitialized]
>    if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte &&
>        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
>        clone_data_offset == data_offset)
>        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> fs/btrfs/send.c:5148:7: note: 'clone_data_offset' was declared here
>    u64 clone_data_offset;
>        ^~~~~~~~~~~~~~~~~
>   LD [M]  fs/btrfs/btrfs.o
> 

I will fix it in v2 version together.


> $ gcc --version
> gcc --version
> gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
> Copyright (C) 2016 Free Software Foundation, Inc.
> 
> Harmless but we shouldn't have warnings, initialize it to something
> impossible (u64(-1) / U64_MAX) or re-structure the change below.
> 
>> 
>>                 if (slot >= btrfs_header_nritems(leaf)) {
>>                         ret = btrfs_next_leaf(clone_root->root, path);
>> @@ -5201,10 +5199,24 @@ static int clone_range(struct send_ctx *sctx,
>>                 if (key.offset >= clone_root->offset + len)
>>                         break;
>> 
>> +               if (btrfs_file_extent_disk_bytenr(leaf, ei) == 
>> disk_byte) {
>> +                       clone_root->offset = key.offset;
>> +                       clone_data_offset = 
>> btrfs_file_extent_offset(leaf, ei);
>> +                       if (clone_data_offset < data_offset &&
>> +                               clone_data_offset + ext_len > 
>> data_offset) {
>> +                               u64 extent_offset;
>> +
>> +                               extent_offset = data_offset - 
>> clone_data_offset;
>> +                               ext_len -= extent_offset;
>> +                               clone_data_offset += extent_offset;
>> +                               clone_root->offset += extent_offset;
>> +                       }
>> +               }
>> +
>>                 clone_len = min_t(u64, ext_len, len);
>> 
>>                 if (btrfs_file_extent_disk_bytenr(leaf, ei) == 
>> disk_byte &&
>> -                   btrfs_file_extent_offset(leaf, ei) == data_offset)
>> +                   clone_data_offset == data_offset)
>>                         ret = send_clone(sctx, offset, clone_len, 
>> clone_root);
>>                 else
>>                         ret = send_extent_data(sctx, offset, 
>> clone_len);
>> --
>> 1.9.1
>>
kernel test robot March 31, 2019, 9:36 a.m. UTC | #3
Hi robbieko,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on btrfs/next]
[also build test WARNING on v5.1-rc2 next-20190329]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/robbieko/Btrfs-send-improve-clone-range/20190331-162406
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git next
config: i386-randconfig-x005-201913 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

   fs/btrfs/send.c: In function 'process_extent':
>> fs/btrfs/send.c:5195:60: warning: 'clone_data_offset' may be used uninitialized in this function [-Wmaybe-uninitialized]
      if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte &&
          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
          clone_data_offset == data_offset)
          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                      
   fs/btrfs/send.c:5125:7: note: 'clone_data_offset' was declared here
      u64 clone_data_offset;
          ^~~~~~~~~~~~~~~~~

vim +/clone_data_offset +5195 fs/btrfs/send.c

d906d49f Filipe Manana      2015-10-02  5048  
d906d49f Filipe Manana      2015-10-02  5049  static int clone_range(struct send_ctx *sctx,
d906d49f Filipe Manana      2015-10-02  5050  		       struct clone_root *clone_root,
d906d49f Filipe Manana      2015-10-02  5051  		       const u64 disk_byte,
d906d49f Filipe Manana      2015-10-02  5052  		       u64 data_offset,
d906d49f Filipe Manana      2015-10-02  5053  		       u64 offset,
d906d49f Filipe Manana      2015-10-02  5054  		       u64 len)
d906d49f Filipe Manana      2015-10-02  5055  {
d906d49f Filipe Manana      2015-10-02  5056  	struct btrfs_path *path;
d906d49f Filipe Manana      2015-10-02  5057  	struct btrfs_key key;
d906d49f Filipe Manana      2015-10-02  5058  	int ret;
d906d49f Filipe Manana      2015-10-02  5059  
72610b1b Filipe Manana      2017-08-10  5060  	/*
72610b1b Filipe Manana      2017-08-10  5061  	 * Prevent cloning from a zero offset with a length matching the sector
72610b1b Filipe Manana      2017-08-10  5062  	 * size because in some scenarios this will make the receiver fail.
72610b1b Filipe Manana      2017-08-10  5063  	 *
72610b1b Filipe Manana      2017-08-10  5064  	 * For example, if in the source filesystem the extent at offset 0
72610b1b Filipe Manana      2017-08-10  5065  	 * has a length of sectorsize and it was written using direct IO, then
72610b1b Filipe Manana      2017-08-10  5066  	 * it can never be an inline extent (even if compression is enabled).
72610b1b Filipe Manana      2017-08-10  5067  	 * Then this extent can be cloned in the original filesystem to a non
72610b1b Filipe Manana      2017-08-10  5068  	 * zero file offset, but it may not be possible to clone in the
72610b1b Filipe Manana      2017-08-10  5069  	 * destination filesystem because it can be inlined due to compression
72610b1b Filipe Manana      2017-08-10  5070  	 * on the destination filesystem (as the receiver's write operations are
72610b1b Filipe Manana      2017-08-10  5071  	 * always done using buffered IO). The same happens when the original
72610b1b Filipe Manana      2017-08-10  5072  	 * filesystem does not have compression enabled but the destination
72610b1b Filipe Manana      2017-08-10  5073  	 * filesystem has.
72610b1b Filipe Manana      2017-08-10  5074  	 */
72610b1b Filipe Manana      2017-08-10  5075  	if (clone_root->offset == 0 &&
72610b1b Filipe Manana      2017-08-10  5076  	    len == sctx->send_root->fs_info->sectorsize)
72610b1b Filipe Manana      2017-08-10  5077  		return send_extent_data(sctx, offset, len);
72610b1b Filipe Manana      2017-08-10  5078  
d906d49f Filipe Manana      2015-10-02  5079  	path = alloc_path_for_send();
d906d49f Filipe Manana      2015-10-02  5080  	if (!path)
d906d49f Filipe Manana      2015-10-02  5081  		return -ENOMEM;
d906d49f Filipe Manana      2015-10-02  5082  
d906d49f Filipe Manana      2015-10-02  5083  	/*
d906d49f Filipe Manana      2015-10-02  5084  	 * We can't send a clone operation for the entire range if we find
d906d49f Filipe Manana      2015-10-02  5085  	 * extent items in the respective range in the source file that
d906d49f Filipe Manana      2015-10-02  5086  	 * refer to different extents or if we find holes.
d906d49f Filipe Manana      2015-10-02  5087  	 * So check for that and do a mix of clone and regular write/copy
d906d49f Filipe Manana      2015-10-02  5088  	 * operations if needed.
d906d49f Filipe Manana      2015-10-02  5089  	 *
d906d49f Filipe Manana      2015-10-02  5090  	 * Example:
d906d49f Filipe Manana      2015-10-02  5091  	 *
d906d49f Filipe Manana      2015-10-02  5092  	 * mkfs.btrfs -f /dev/sda
d906d49f Filipe Manana      2015-10-02  5093  	 * mount /dev/sda /mnt
d906d49f Filipe Manana      2015-10-02  5094  	 * xfs_io -f -c "pwrite -S 0xaa 0K 100K" /mnt/foo
d906d49f Filipe Manana      2015-10-02  5095  	 * cp --reflink=always /mnt/foo /mnt/bar
d906d49f Filipe Manana      2015-10-02  5096  	 * xfs_io -c "pwrite -S 0xbb 50K 50K" /mnt/foo
d906d49f Filipe Manana      2015-10-02  5097  	 * btrfs subvolume snapshot -r /mnt /mnt/snap
d906d49f Filipe Manana      2015-10-02  5098  	 *
d906d49f Filipe Manana      2015-10-02  5099  	 * If when we send the snapshot and we are processing file bar (which
d906d49f Filipe Manana      2015-10-02  5100  	 * has a higher inode number than foo) we blindly send a clone operation
d906d49f Filipe Manana      2015-10-02  5101  	 * for the [0, 100K[ range from foo to bar, the receiver ends up getting
d906d49f Filipe Manana      2015-10-02  5102  	 * a file bar that matches the content of file foo - iow, doesn't match
d906d49f Filipe Manana      2015-10-02  5103  	 * the content from bar in the original filesystem.
d906d49f Filipe Manana      2015-10-02  5104  	 */
d906d49f Filipe Manana      2015-10-02  5105  	key.objectid = clone_root->ino;
d906d49f Filipe Manana      2015-10-02  5106  	key.type = BTRFS_EXTENT_DATA_KEY;
d906d49f Filipe Manana      2015-10-02  5107  	key.offset = clone_root->offset;
d906d49f Filipe Manana      2015-10-02  5108  	ret = btrfs_search_slot(NULL, clone_root->root, &key, path, 0, 0);
d906d49f Filipe Manana      2015-10-02  5109  	if (ret < 0)
d906d49f Filipe Manana      2015-10-02  5110  		goto out;
d906d49f Filipe Manana      2015-10-02  5111  	if (ret > 0 && path->slots[0] > 0) {
d906d49f Filipe Manana      2015-10-02  5112  		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0] - 1);
d906d49f Filipe Manana      2015-10-02  5113  		if (key.objectid == clone_root->ino &&
d906d49f Filipe Manana      2015-10-02  5114  		    key.type == BTRFS_EXTENT_DATA_KEY)
d906d49f Filipe Manana      2015-10-02  5115  			path->slots[0]--;
d906d49f Filipe Manana      2015-10-02  5116  	}
d906d49f Filipe Manana      2015-10-02  5117  
d906d49f Filipe Manana      2015-10-02  5118  	while (true) {
d906d49f Filipe Manana      2015-10-02  5119  		struct extent_buffer *leaf = path->nodes[0];
d906d49f Filipe Manana      2015-10-02  5120  		int slot = path->slots[0];
d906d49f Filipe Manana      2015-10-02  5121  		struct btrfs_file_extent_item *ei;
d906d49f Filipe Manana      2015-10-02  5122  		u8 type;
d906d49f Filipe Manana      2015-10-02  5123  		u64 ext_len;
d906d49f Filipe Manana      2015-10-02  5124  		u64 clone_len;
c4b3268d Robbie Ko          2019-03-28  5125  		u64 clone_data_offset;
d906d49f Filipe Manana      2015-10-02  5126  
d906d49f Filipe Manana      2015-10-02  5127  		if (slot >= btrfs_header_nritems(leaf)) {
d906d49f Filipe Manana      2015-10-02  5128  			ret = btrfs_next_leaf(clone_root->root, path);
d906d49f Filipe Manana      2015-10-02  5129  			if (ret < 0)
d906d49f Filipe Manana      2015-10-02  5130  				goto out;
d906d49f Filipe Manana      2015-10-02  5131  			else if (ret > 0)
d906d49f Filipe Manana      2015-10-02  5132  				break;
d906d49f Filipe Manana      2015-10-02  5133  			continue;
d906d49f Filipe Manana      2015-10-02  5134  		}
d906d49f Filipe Manana      2015-10-02  5135  
d906d49f Filipe Manana      2015-10-02  5136  		btrfs_item_key_to_cpu(leaf, &key, slot);
d906d49f Filipe Manana      2015-10-02  5137  
d906d49f Filipe Manana      2015-10-02  5138  		/*
d906d49f Filipe Manana      2015-10-02  5139  		 * We might have an implicit trailing hole (NO_HOLES feature
d906d49f Filipe Manana      2015-10-02  5140  		 * enabled). We deal with it after leaving this loop.
d906d49f Filipe Manana      2015-10-02  5141  		 */
d906d49f Filipe Manana      2015-10-02  5142  		if (key.objectid != clone_root->ino ||
d906d49f Filipe Manana      2015-10-02  5143  		    key.type != BTRFS_EXTENT_DATA_KEY)
d906d49f Filipe Manana      2015-10-02  5144  			break;
d906d49f Filipe Manana      2015-10-02  5145  
d906d49f Filipe Manana      2015-10-02  5146  		ei = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
d906d49f Filipe Manana      2015-10-02  5147  		type = btrfs_file_extent_type(leaf, ei);
d906d49f Filipe Manana      2015-10-02  5148  		if (type == BTRFS_FILE_EXTENT_INLINE) {
d906d49f Filipe Manana      2015-10-02  5149  			ext_len = btrfs_file_extent_inline_len(leaf, slot, ei);
09cbfeaf Kirill A. Shutemov 2016-04-01  5150  			ext_len = PAGE_ALIGN(ext_len);
d906d49f Filipe Manana      2015-10-02  5151  		} else {
d906d49f Filipe Manana      2015-10-02  5152  			ext_len = btrfs_file_extent_num_bytes(leaf, ei);
d906d49f Filipe Manana      2015-10-02  5153  		}
d906d49f Filipe Manana      2015-10-02  5154  
d906d49f Filipe Manana      2015-10-02  5155  		if (key.offset + ext_len <= clone_root->offset)
d906d49f Filipe Manana      2015-10-02  5156  			goto next;
d906d49f Filipe Manana      2015-10-02  5157  
d906d49f Filipe Manana      2015-10-02  5158  		if (key.offset > clone_root->offset) {
d906d49f Filipe Manana      2015-10-02  5159  			/* Implicit hole, NO_HOLES feature enabled. */
d906d49f Filipe Manana      2015-10-02  5160  			u64 hole_len = key.offset - clone_root->offset;
d906d49f Filipe Manana      2015-10-02  5161  
d906d49f Filipe Manana      2015-10-02  5162  			if (hole_len > len)
d906d49f Filipe Manana      2015-10-02  5163  				hole_len = len;
d906d49f Filipe Manana      2015-10-02  5164  			ret = send_extent_data(sctx, offset, hole_len);
d906d49f Filipe Manana      2015-10-02  5165  			if (ret < 0)
d906d49f Filipe Manana      2015-10-02  5166  				goto out;
d906d49f Filipe Manana      2015-10-02  5167  
d906d49f Filipe Manana      2015-10-02  5168  			len -= hole_len;
d906d49f Filipe Manana      2015-10-02  5169  			if (len == 0)
d906d49f Filipe Manana      2015-10-02  5170  				break;
d906d49f Filipe Manana      2015-10-02  5171  			offset += hole_len;
d906d49f Filipe Manana      2015-10-02  5172  			clone_root->offset += hole_len;
d906d49f Filipe Manana      2015-10-02  5173  			data_offset += hole_len;
d906d49f Filipe Manana      2015-10-02  5174  		}
d906d49f Filipe Manana      2015-10-02  5175  
d906d49f Filipe Manana      2015-10-02  5176  		if (key.offset >= clone_root->offset + len)
d906d49f Filipe Manana      2015-10-02  5177  			break;
d906d49f Filipe Manana      2015-10-02  5178  
c4b3268d Robbie Ko          2019-03-28  5179  		if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte) {
c4b3268d Robbie Ko          2019-03-28  5180  			clone_root->offset = key.offset;
c4b3268d Robbie Ko          2019-03-28  5181  			clone_data_offset = btrfs_file_extent_offset(leaf, ei);
c4b3268d Robbie Ko          2019-03-28  5182  			if (clone_data_offset < data_offset &&
c4b3268d Robbie Ko          2019-03-28  5183  				clone_data_offset + ext_len > data_offset) {
c4b3268d Robbie Ko          2019-03-28  5184  				u64 extent_offset;
c4b3268d Robbie Ko          2019-03-28  5185  
c4b3268d Robbie Ko          2019-03-28  5186  				extent_offset = data_offset - clone_data_offset;
c4b3268d Robbie Ko          2019-03-28  5187  				ext_len -= extent_offset;
c4b3268d Robbie Ko          2019-03-28  5188  				clone_data_offset += extent_offset;
c4b3268d Robbie Ko          2019-03-28  5189  				clone_root->offset += extent_offset;
c4b3268d Robbie Ko          2019-03-28  5190  			}
c4b3268d Robbie Ko          2019-03-28  5191  		}
c4b3268d Robbie Ko          2019-03-28  5192  
d906d49f Filipe Manana      2015-10-02  5193  		clone_len = min_t(u64, ext_len, len);
d906d49f Filipe Manana      2015-10-02  5194  
d906d49f Filipe Manana      2015-10-02 @5195  		if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte &&
c4b3268d Robbie Ko          2019-03-28  5196  		    clone_data_offset == data_offset)
d906d49f Filipe Manana      2015-10-02  5197  			ret = send_clone(sctx, offset, clone_len, clone_root);
d906d49f Filipe Manana      2015-10-02  5198  		else
d906d49f Filipe Manana      2015-10-02  5199  			ret = send_extent_data(sctx, offset, clone_len);
d906d49f Filipe Manana      2015-10-02  5200  
d906d49f Filipe Manana      2015-10-02  5201  		if (ret < 0)
d906d49f Filipe Manana      2015-10-02  5202  			goto out;
d906d49f Filipe Manana      2015-10-02  5203  
d906d49f Filipe Manana      2015-10-02  5204  		len -= clone_len;
d906d49f Filipe Manana      2015-10-02  5205  		if (len == 0)
d906d49f Filipe Manana      2015-10-02  5206  			break;
d906d49f Filipe Manana      2015-10-02  5207  		offset += clone_len;
d906d49f Filipe Manana      2015-10-02  5208  		clone_root->offset += clone_len;
d906d49f Filipe Manana      2015-10-02  5209  		data_offset += clone_len;
d906d49f Filipe Manana      2015-10-02  5210  next:
d906d49f Filipe Manana      2015-10-02  5211  		path->slots[0]++;
d906d49f Filipe Manana      2015-10-02  5212  	}
d906d49f Filipe Manana      2015-10-02  5213  
d906d49f Filipe Manana      2015-10-02  5214  	if (len > 0)
d906d49f Filipe Manana      2015-10-02  5215  		ret = send_extent_data(sctx, offset, len);
d906d49f Filipe Manana      2015-10-02  5216  	else
d906d49f Filipe Manana      2015-10-02  5217  		ret = 0;
d906d49f Filipe Manana      2015-10-02  5218  out:
d906d49f Filipe Manana      2015-10-02  5219  	btrfs_free_path(path);
d906d49f Filipe Manana      2015-10-02  5220  	return ret;
d906d49f Filipe Manana      2015-10-02  5221  }
d906d49f Filipe Manana      2015-10-02  5222  

:::::: The code at line 5195 was first introduced by commit
:::::: d906d49fc5f49a0129527e8fbc13312f36b9b9ce Btrfs: send, fix file corruption due to incorrect cloning operations

:::::: TO: Filipe Manana <fdmanana@suse.com>
:::::: CC: Filipe Manana <fdmanana@suse.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
diff mbox series

Patch

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 7ea2d6b..7766b12 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -1240,9 +1240,6 @@  static int __iterate_backrefs(u64 ino, u64 offset, u64 root, void *ctx_)
 	if (ret < 0)
 		return ret;
 
-	if (offset + bctx->data_offset + bctx->extent_len > i_size)
-		return 0;
-
 	/*
 	 * Make sure we don't consider clones from send_root that are
 	 * behind the current inode/offset.
@@ -5148,6 +5145,7 @@  static int clone_range(struct send_ctx *sctx,
 		u8 type;
 		u64 ext_len;
 		u64 clone_len;
+		u64 clone_data_offset;
 
 		if (slot >= btrfs_header_nritems(leaf)) {
 			ret = btrfs_next_leaf(clone_root->root, path);
@@ -5201,10 +5199,24 @@  static int clone_range(struct send_ctx *sctx,
 		if (key.offset >= clone_root->offset + len)
 			break;
 
+		if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte) {
+			clone_root->offset = key.offset;
+			clone_data_offset = btrfs_file_extent_offset(leaf, ei);
+			if (clone_data_offset < data_offset &&
+				clone_data_offset + ext_len > data_offset) {
+				u64 extent_offset;
+
+				extent_offset = data_offset - clone_data_offset;
+				ext_len -= extent_offset;
+				clone_data_offset += extent_offset;
+				clone_root->offset += extent_offset;
+			}
+		}
+
 		clone_len = min_t(u64, ext_len, len);
 
 		if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte &&
-		    btrfs_file_extent_offset(leaf, ei) == data_offset)
+		    clone_data_offset == data_offset)
 			ret = send_clone(sctx, offset, clone_len, clone_root);
 		else
 			ret = send_extent_data(sctx, offset, clone_len);