diff mbox

mixed inline, non-inline extents leading to EIO when reading small files

Message ID 20160526172628.pd6ps5ieuxbxwno7@floor.thefacebook.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Mason May 26, 2016, 5:26 p.m. UTC
On Thu, May 26, 2016 at 12:19:52PM -0400, Zygo Blaxell wrote:
> I frequently see these in /etc/lvm/backup/*.  Something that LVM does
> when it writes these files triggers the problem.  This problem occurs
> in kernels 3.18..4.4.11 (i.e. all the kernels I've tested).
> 
> btrfs-debug-tree finds this:
> 
>         item 26 key (2702988 INODE_ITEM 0) itemoff 12632 itemsize 160
>                 inode generation 49642 transid 49799 size 7856 nbytes 8192
>                 block group 0 mode 100644 links 1 uid 0 gid 0
>                 rdev 0 flags 0x0
>         item 27 key (2702988 INODE_REF 2799) itemoff 12617 itemsize 15
>                 inode ref index 4 namelen 5 name: volgr
>         item 28 key (2702988 EXTENT_DATA 0) itemoff 11247 itemsize 1370
>                 inline extent data size 1349 ram 4096 compress(zlib)
>         item 29 key (2702988 EXTENT_DATA 4096) itemoff 11194 itemsize 53
>                 extent data disk byte 1161560064 nr 4096
>                 extent data offset 0 nr 4096 ram 4096
>                 extent compression(none)
> 
> When the problem occurs it usually affects all files in /etc/lvm/backup.
> I have seen it randomly in other parts of the filesystem but it's much
> rarer elsewhere.
> 
> Attempts to read this file return EIO.  There are no errors reported in
> scrub or kmesg.
> 
> Filesystem is mounted with options:
> 
> 	noatime,compress-force=zlib,flushoncommit,space_cache,skip_balance,commit=300
> 
> Am I missing anything?

I've got this queued up to send out.  We hit this with holes instead of
extents, and without compression, but its a similar problem:

Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent

When dealing with inline extents, btrfs_get_extent will incorrectly try
to insert a duplicate extent_map.  The dup hits -EEXIST from
add_extent_map, but then we try to merge with the existing one and end
up trying to insert a zero length extent_map.

This actually works most of the time, except when there are extent maps
past the end of the inline extent.  rocksdb will trigger this sometimes
because it preallocates an extent and then truncates down.

Josef made a script to trigger with xfs_io:

#!/bin/bash

xfs_io -f -c "pwrite 0 1000" inline
xfs_io -c "falloc -k 4k 1M" inline
xfs_io -c "pread 0 1000" -c "fadvise -d 0 1000" -c "pread 0 1000" inline
xfs_io -c "fadvise -d 0 1000" inline
cat inline

You'll get EIOs trying to read inline after this because add_extent_map
is returning EEXIST

Signed-off-by: Chris Mason <clm@fb.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 98a3ba2..4352589 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6914,7 +6914,18 @@  insert:
 		 * existing will always be non-NULL, since there must be
 		 * extent causing the -EEXIST.
 		 */
-		if (start >= extent_map_end(existing) ||
+		if (existing->start == em->start &&
+		    extent_map_end(existing) == extent_map_end(em) &&
+		    em->block_start == existing->block_start) {
+			/*
+			 * these two extents are the same, it happens
+			 * with inlines especially
+			 */
+			free_extent_map(em);
+			em = existing;
+			err = 0;
+
+		} else if (start >= extent_map_end(existing) ||
 		    start <= existing->start) {
 			/*
 			 * The existing extent map is the one nearest to