diff mbox

[v2] Btrfs: fix range cloning when same inode used as source and destination

Message ID 1427995511-13028-1-git-send-email-fdmanana@suse.com (mailing list archive)
State Accepted
Headers show

Commit Message

Filipe Manana April 2, 2015, 5:25 p.m. UTC
While searching for extents to clone we might find one where we only use
a part of it coming from its tail. If our destination inode is the same
the source inode, we end up removing the tail part of the extent item and
insert after a new one that point to the same extent with an adjusted
key file offset and data offset. After this we search for the next extent
item in the fs/subvol tree with a key that has an offset incremented by
one. But this second search leaves us at the new extent item we inserted
previously, and since that extent item has a non-zero data offset, it
it can make us call btrfs_drop_extents with an empty range (start == end)
which causes the following warning:

[23978.537119] WARNING: CPU: 6 PID: 16251 at fs/btrfs/file.c:550 btrfs_drop_extent_cache+0x43/0x385 [btrfs]()
(...)
[23978.557266] Call Trace:
[23978.557978]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[23978.559191]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[23978.560699]  [<ffffffffa047f0ea>] ? btrfs_drop_extent_cache+0x43/0x385 [btrfs]
[23978.562389]  [<ffffffff8104544d>] warn_slowpath_null+0x1a/0x1c
[23978.563613]  [<ffffffffa047f0ea>] btrfs_drop_extent_cache+0x43/0x385 [btrfs]
[23978.565103]  [<ffffffff810e3a18>] ? time_hardirqs_off+0x15/0x28
[23978.566294]  [<ffffffff81079ff8>] ? trace_hardirqs_off+0xd/0xf
[23978.567438]  [<ffffffffa047f73d>] __btrfs_drop_extents+0x6b/0x9e1 [btrfs]
[23978.568702]  [<ffffffff8107c03f>] ? trace_hardirqs_on+0xd/0xf
[23978.569763]  [<ffffffff811441c0>] ? ____cache_alloc+0x69/0x2eb
[23978.570817]  [<ffffffff81142269>] ? virt_to_head_page+0x9/0x36
[23978.571872]  [<ffffffff81143c15>] ? cache_alloc_debugcheck_after.isra.42+0x16c/0x1cb
[23978.573466]  [<ffffffff811420d5>] ? kmemleak_alloc_recursive.constprop.52+0x16/0x18
[23978.574962]  [<ffffffffa0480d07>] btrfs_drop_extents+0x66/0x7f [btrfs]
[23978.576179]  [<ffffffffa049aa35>] btrfs_clone+0x516/0xaf5 [btrfs]
[23978.577311]  [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
[23978.578520]  [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
[23978.580282]  [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
(...)
[23978.591887] ---[ end trace 988ec2a653d03ed3 ]---

Then we attempt to insert a new extent item with a key that already
exists, which makes btrfs_insert_empty_item return -EEXIST resulting in
abortion of the current transaction:

[23978.594355] WARNING: CPU: 6 PID: 16251 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
(...)
[23978.622589] Call Trace:
[23978.623181]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[23978.624359]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[23978.625573]  [<ffffffffa044ab6c>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
[23978.626971]  [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
[23978.628003]  [<ffffffff8108a6c8>] ? vprintk_default+0x1d/0x1f
[23978.629138]  [<ffffffffa044ab6c>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
[23978.630528]  [<ffffffffa049ad1b>] btrfs_clone+0x7fc/0xaf5 [btrfs]
[23978.631635]  [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
[23978.632886]  [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
[23978.634119]  [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
(...)
[23978.647714] ---[ end trace 988ec2a653d03ed4 ]---

This is wrong because we should not process the extent item that we just
inserted previously, and instead process the extent item that follows it
in the tree

For example for the test case I wrote for fstests:

   bs=$((64 * 1024))
   mkfs.btrfs -f -l $bs -O ^no-holes /dev/sdc
   mount /dev/sdc /mnt

   xfs_io -f -c "pwrite -S 0xaa $(($bs * 2)) $(($bs * 2))" /mnt/foo

   $CLONER_PROG -s $((3 * $bs)) -d $((267 * $bs)) -l 0 /mnt/foo /mnt/foo
   $CLONER_PROG -s $((217 * $bs)) -d $((95 * $bs)) -l 0 /mnt/foo /mnt/foo

The second clone call fails with -EEXIST, because when we process the
first extent item (offset 262144), we drop part of it (counting from the
end) and then insert a new extent item with a key greater then the key we
found. The next time we search the tree we search for a key with offset
262144 + 1, which leaves us at the new extent item we have just inserted
but we think it refers to an extent that we need to clone.

Fix this by ensuring the next search key uses an offset corresponding to
the offset of the key we found previously plus the data length of the
corresponding extent item. This ensures we skip new extent items that we
inserted and works for the case of implicit holes too (NO_HOLES feature).

A test case for fstests follows soon.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

V2: Fixed a warning about potentially uninitialized variable. David
    got this warning on a 4.5.1 gcc, but I didn't on a 4.9.2 gcc
    however.

 fs/btrfs/ioctl.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Holger Hoffstätte April 2, 2015, 5:31 p.m. UTC | #1
On Thu, 02 Apr 2015 18:25:11 +0100, Filipe Manana wrote:

> V2: Fixed a warning about potentially uninitialized variable. David
>     got this warning on a 4.5.1 gcc, but I didn't on a 4.9.2 gcc
>     however.

I was *just* about to post this warning, since I saw it only a minute ago!

I assume you mean:

fs/btrfs/ioctl.c: In function 'btrfs_clone':
fs/btrfs/ioctl.c:3531:14: warning: 'next_key_min_offset' may be used uninitialized in this function [-Wmaybe-uninitialized]
   key.offset = next_key_min_offset;
              ^

..and this is with 4.9.2 here.

Anyway..thanks for being faster :)

Holger

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filipe Manana April 2, 2015, 6:03 p.m. UTC | #2
On Thu, Apr 2, 2015 at 6:31 PM, Holger Hoffstätte
<holger.hoffstaette@googlemail.com> wrote:
>
> On Thu, 02 Apr 2015 18:25:11 +0100, Filipe Manana wrote:
>
>> V2: Fixed a warning about potentially uninitialized variable. David
>>     got this warning on a 4.5.1 gcc, but I didn't on a 4.9.2 gcc
>>     however.
>
> I was *just* about to post this warning, since I saw it only a minute ago!
>
> I assume you mean:
>
> fs/btrfs/ioctl.c: In function 'btrfs_clone':
> fs/btrfs/ioctl.c:3531:14: warning: 'next_key_min_offset' may be used uninitialized in this function [-Wmaybe-uninitialized]
>    key.offset = next_key_min_offset;
>               ^
>
> ..and this is with 4.9.2 here.
>
> Anyway..thanks for being faster :)

Yes, that was it.
Thanks.

>
> Holger
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 09a566a..029f4da 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3254,6 +3254,7 @@  process_slot:
 			u64 datao = 0, datal = 0;
 			u8 comp;
 			u64 drop_start;
+			u64 next_key_min_offset;
 
 			extent = btrfs_item_ptr(leaf, slot,
 						struct btrfs_file_extent_item);
@@ -3285,7 +3286,7 @@  process_slot:
 			} else if (key.offset >= off + len) {
 				break;
 			}
-
+			next_key_min_offset = key.offset + datal;
 			size = btrfs_item_size_nr(leaf, slot);
 			read_extent_buffer(leaf, buf,
 					   btrfs_item_ptr_offset(leaf, slot),
@@ -3498,9 +3499,11 @@  process_slot:
 				goto out;
 			if (new_key.offset + datal >= destoff + len)
 				break;
+			key.offset = next_key_min_offset;
+		} else {
+			key.offset++;
 		}
 		btrfs_release_path(path);
-		key.offset++;
 	}
 	ret = 0;