diff mbox

[v2] Btrfs: fix unexpected file hole after disk errors

Message ID 1488831810-26684-1-git-send-email-bo.li.liu@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Liu Bo March 6, 2017, 8:23 p.m. UTC
Btrfs creates hole extents to cover any unwritten section right before
doing buffer writes after commit 3ac0d7b96a26 ("btrfs: Change the expanding
write sequence to fix snapshot related bug.").

However, that takes the start position of the buffered write to compare
against the current EOF, hole extents would be created only if (EOF <
start).

If the EOF is at the middle of the buffered write, no hole extents will be
created and a file hole without a hole extent is left in this file.

This bug was revealed by generic/019 in fstests.  'fsstress' in this test
may create the above situation and the test then fails all requests
including writes, so the buffer write which is supposed to cover the
hole (without the hole extent) couldn't make it on disk.  Running fsck
against such btrfs ends up with detecting file extent holes.

Things could be more serious, some stale data would be exposed to
userspace if files with this kind of hole are truncated to a position of
the hole, because the on-disk inode size is beyond the last extent in the
file.

This fixes the bug by comparing the end position against the EOF.

Cc: David Sterba <dsterba@suse.cz>
Cc: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
v2: update comments to be precise.

 fs/btrfs/file.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Comments

Qu Wenruo March 7, 2017, 12:28 a.m. UTC | #1
At 03/07/2017 04:23 AM, Liu Bo wrote:
> Btrfs creates hole extents to cover any unwritten section right before
> doing buffer writes after commit 3ac0d7b96a26 ("btrfs: Change the expanding
> write sequence to fix snapshot related bug.").
>
> However, that takes the start position of the buffered write to compare
> against the current EOF, hole extents would be created only if (EOF <
> start).
>
> If the EOF is at the middle of the buffered write, no hole extents will be
> created and a file hole without a hole extent is left in this file.
>
> This bug was revealed by generic/019 in fstests.  'fsstress' in this test
> may create the above situation and the test then fails all requests
> including writes, so the buffer write which is supposed to cover the
> hole (without the hole extent) couldn't make it on disk.  Running fsck
> against such btrfs ends up with detecting file extent holes.
>
> Things could be more serious, some stale data would be exposed to
> userspace if files with this kind of hole are truncated to a position of
> the hole, because the on-disk inode size is beyond the last extent in the
> file.
>
> This fixes the bug by comparing the end position against the EOF.
>
> Cc: David Sterba <dsterba@suse.cz>
> Cc: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>

Looks good.

Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

Thanks,
Qu
> ---
> v2: update comments to be precise.
>
>  fs/btrfs/file.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 520cb72..dcf0286 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1865,11 +1865,13 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>  	pos = iocb->ki_pos;
>  	count = iov_iter_count(from);
>  	start_pos = round_down(pos, fs_info->sectorsize);
> +	end_pos = round_up(pos + count, fs_info->sectorsize);
>  	oldsize = i_size_read(inode);
> -	if (start_pos > oldsize) {
> -		/* Expand hole size to cover write data, preventing empty gap */
> -		end_pos = round_up(pos + count,
> -				   fs_info->sectorsize);
> +	if (end_pos > oldsize) {
> +		/*
> +		 * Expand hole size to cover write data in order to prevent an
> +		 * empty gap in case of a write failure.
> +		 */
>  		err = btrfs_cont_expand(inode, oldsize, end_pos);
>  		if (err) {
>  			inode_unlock(inode);
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba March 28, 2017, 12:50 p.m. UTC | #2
On Mon, Mar 06, 2017 at 12:23:30PM -0800, Liu Bo wrote:
> Btrfs creates hole extents to cover any unwritten section right before
> doing buffer writes after commit 3ac0d7b96a26 ("btrfs: Change the expanding
> write sequence to fix snapshot related bug.").
> 
> However, that takes the start position of the buffered write to compare
> against the current EOF, hole extents would be created only if (EOF <
> start).
> 
> If the EOF is at the middle of the buffered write, no hole extents will be
> created and a file hole without a hole extent is left in this file.
> 
> This bug was revealed by generic/019 in fstests.  'fsstress' in this test
> may create the above situation and the test then fails all requests
> including writes, so the buffer write which is supposed to cover the
> hole (without the hole extent) couldn't make it on disk.  Running fsck
> against such btrfs ends up with detecting file extent holes.
> 
> Things could be more serious, some stale data would be exposed to
> userspace if files with this kind of hole are truncated to a position of
> the hole, because the on-disk inode size is beyond the last extent in the
> file.
> 
> This fixes the bug by comparing the end position against the EOF.

Is the test reliable? As I read it, it should be possible to craft the
file extents and trigger the bug. And verify the fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo March 28, 2017, 6:40 p.m. UTC | #3
On Tue, Mar 28, 2017 at 02:50:06PM +0200, David Sterba wrote:
> On Mon, Mar 06, 2017 at 12:23:30PM -0800, Liu Bo wrote:
> > Btrfs creates hole extents to cover any unwritten section right before
> > doing buffer writes after commit 3ac0d7b96a26 ("btrfs: Change the expanding
> > write sequence to fix snapshot related bug.").
> > 
> > However, that takes the start position of the buffered write to compare
> > against the current EOF, hole extents would be created only if (EOF <
> > start).
> > 
> > If the EOF is at the middle of the buffered write, no hole extents will be
> > created and a file hole without a hole extent is left in this file.
> > 
> > This bug was revealed by generic/019 in fstests.  'fsstress' in this test
> > may create the above situation and the test then fails all requests
> > including writes, so the buffer write which is supposed to cover the
> > hole (without the hole extent) couldn't make it on disk.  Running fsck
> > against such btrfs ends up with detecting file extent holes.
> > 
> > Things could be more serious, some stale data would be exposed to
> > userspace if files with this kind of hole are truncated to a position of
> > the hole, because the on-disk inode size is beyond the last extent in the
> > file.
> > 
> > This fixes the bug by comparing the end position against the EOF.
> 
> Is the test reliable? As I read it, it should be possible to craft the
> file extents and trigger the bug. And verify the fix.

It's not, running generic/019 by 10 times could produces a btrfsck
error once on my box.

Actually I think we don't need this patch any more since we're going
to remove hole extents.

I made a mistake when writing the above 'more serious' part, in fact
truncating to a hole doesn't end up stale data as we don't have any
file extent that points to the hole and reading the hole part should
get all zero, thus there's no serious problem.

It was another bug about setting disk isize to zero accidentally and
data was lost, which has been fixed.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 520cb72..dcf0286 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1865,11 +1865,13 @@  static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	pos = iocb->ki_pos;
 	count = iov_iter_count(from);
 	start_pos = round_down(pos, fs_info->sectorsize);
+	end_pos = round_up(pos + count, fs_info->sectorsize);
 	oldsize = i_size_read(inode);
-	if (start_pos > oldsize) {
-		/* Expand hole size to cover write data, preventing empty gap */
-		end_pos = round_up(pos + count,
-				   fs_info->sectorsize);
+	if (end_pos > oldsize) {
+		/*
+		 * Expand hole size to cover write data in order to prevent an
+		 * empty gap in case of a write failure.
+		 */
 		err = btrfs_cont_expand(inode, oldsize, end_pos);
 		if (err) {
 			inode_unlock(inode);