diff mbox

Btrfs: fix performance regression of writing to prealloc/nocow file

Message ID

1458252975-1349-1-git-send-email-bo.li.liu@oracle.com (mailing list archive)

State

New, archived

Headers

show

From: Liu Bo <bo.li.liu@oracle.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH] Btrfs: fix performance regression of writing to
	prealloc/nocow file
Date: Thu, 17 Mar 2016 15:16:15 -0700
Message-Id: <1458252975-1349-1-git-send-email-bo.li.liu@oracle.com>
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Liu Bo March 17, 2016, 10:16 p.m. UTC

For nocow/prealloc files, we try our best to not allocate space, however,
this ends up a huge performance regression since it's expensive to check
if data is shared.

Let's go back to only go check shared data once we're not able to allocate
space.

The test was made against a tmpfs backed loop device:

xfs_io -f -c "falloc 0 400M" foobar
xfs_io -c "pwrite -W -b 4096 0 400M" foobar

Vanilla:
wrote 419430400/419430400 bytes at offset 0
400 MiB, 102400 ops; 0:00:01.00 (200.015 MiB/sec and 51203.8403 ops/sec)

Patched kernel:
wrote 419430400/419430400 bytes at offset 0
400 MiB, 102400 ops; 0:00:01.00 (346.137 MiB/sec and 88610.9796 ops/sec)

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/file.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

Comments

Holger Hoffstätte March 18, 2016, 11:59 a.m. UTC | #1

(sorry for any duplicates, vger.org hates gmail)

On Thu, Mar 17, 2016 at 11:16 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
> For nocow/prealloc files, we try our best to not allocate space, however,
> this ends up a huge performance regression since it's expensive to check
> if data is shared.
>
> Let's go back to only go check shared data once we're not able to allocate
> space.
>
> The test was made against a tmpfs backed loop device:
>
> xfs_io -f -c "falloc 0 400M" foobar
> xfs_io -c "pwrite -W -b 4096 0 400M" foobar
>
> Vanilla:
> wrote 419430400/419430400 bytes at offset 0
> 400 MiB, 102400 ops; 0:00:01.00 (200.015 MiB/sec and 51203.8403 ops/sec)
>
> Patched kernel:
> wrote 419430400/419430400 bytes at offset 0
> 400 MiB, 102400 ops; 0:00:01.00 (346.137 MiB/sec and 88610.9796 ops/sec)
(snip)

Doesn't this effectively do the same as

https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/fs/btrfs?h=integration-4.6&id=4da2e26a2a32b174878744bd0f07db180c875f26

but in a slightly different way?

With a 4.4++ kernel (btrfs ~4.6) and the above patch I don't see this bad
performance (via loop on tmpfs):

$xfs_io -c "pwrite -W -b 4096 0 400M" foobar
wrote 419430400/419430400 bytes at offset 0
400 MiB, 102400 ops; 0.0000 sec (831.038 MiB/sec and 212745.6235 ops/sec)

Rewriting is even faster:

$xfs_io -c "pwrite -W -b 4096 0 400M" foobar
wrote 419430400/419430400 bytes at offset 0
400 MiB, 102400 ops; 0.0000 sec (1.124 GiB/sec and 294713.5752 ops/sec)

Looks decent enough considering the fsync.

-h
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Roman Mamedov March 18, 2016, 12:44 p.m. UTC | #2

On Thu, 17 Mar 2016 15:16:15 -0700
Liu Bo <bo.li.liu@oracle.com> wrote:

> For nocow/prealloc files, we try our best to not allocate space, however,
> this ends up a huge performance regression since it's expensive to check
> if data is shared.
> 
> Let's go back to only go check shared data once we're not able to allocate
> space.

As mentioned by Holger there's another patch modifying the same function:
https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/fs/btrfs?h=integration-4.6&id=4da2e26a2a32b174878744bd0f07db180c875f26
and that one fixes a serious regression in the 4.4 series.
I did not try yours, but could you please ensure that you do not hit the
described problem with your version of the patch, or perhaps even better,
consider re-testing performance and then basing any further performance
optimizations upon a state with no known grave functionality bugs (i.e. with
the above patch applied).

Thanks

> The test was made against a tmpfs backed loop device:
> 
> xfs_io -f -c "falloc 0 400M" foobar
> xfs_io -c "pwrite -W -b 4096 0 400M" foobar
> 
> Vanilla:
> wrote 419430400/419430400 bytes at offset 0
> 400 MiB, 102400 ops; 0:00:01.00 (200.015 MiB/sec and 51203.8403 ops/sec)
> 
> Patched kernel:
> wrote 419430400/419430400 bytes at offset 0
> 400 MiB, 102400 ops; 0:00:01.00 (346.137 MiB/sec and 88610.9796 ops/sec)
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/file.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 098bb8f..f66c1bc 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1525,16 +1525,12 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>  
>  		reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
>  
> -		if (BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> -					     BTRFS_INODE_PREALLOC)) {
> +		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
> +		if (ret == -ENOSPC &&
> +		    BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> +						     BTRFS_INODE_PREALLOC)) {
>  			ret = check_can_nocow(inode, pos, &write_bytes);
> -			if (ret < 0)
> -				break;
>  			if (ret > 0) {
> -				/*
> -				 * For nodata cow case, no need to reserve
> -				 * data space.
> -				 */
>  				only_release_metadata = true;
>  				/*
>  				 * our prealloc extent may be smaller than
> @@ -1543,10 +1539,13 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>  				num_pages = DIV_ROUND_UP(write_bytes + offset,
>  							 PAGE_CACHE_SIZE);
>  				reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
> +
> +				ret = 0;
>  				goto reserve_metadata;
> +			} else {
> +				ret = -ENOSPC;
>  			}
>  		}
> -		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
>  		if (ret < 0)
>  			break;
>

Liu Bo March 21, 2016, 7:10 p.m. UTC | #3

Hi,

On Fri, Mar 18, 2016 at 05:44:03PM +0500, Roman Mamedov wrote:
> On Thu, 17 Mar 2016 15:16:15 -0700
> Liu Bo <bo.li.liu@oracle.com> wrote:
> 
> > For nocow/prealloc files, we try our best to not allocate space, however,
> > this ends up a huge performance regression since it's expensive to check
> > if data is shared.
> > 
> > Let's go back to only go check shared data once we're not able to allocate
> > space.
> 
> As mentioned by Holger there's another patch modifying the same function:
> https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/fs/btrfs?h=integration-4.6&id=4da2e26a2a32b174878744bd0f07db180c875f26
> and that one fixes a serious regression in the 4.4 series.
> I did not try yours, but could you please ensure that you do not hit the
> described problem with your version of the patch, or perhaps even better,
> consider re-testing performance and then basing any further performance
> optimizations upon a state with no known grave functionality bugs (i.e. with
> the above patch applied).

I've verified with the above test that my patch doesn't introduce a
regression although there're some patch conflicts to deal with.

In fact they're fixing two different problems.

Zhao's patch is fixing the incorrect behavior of parsing error caused by
creating snapshots, while my patch is to go back to reserving space for
nocow/prealloc writes unless we've run out of space.

Thanks,

-liubo
> 
> Thanks
> 
> > The test was made against a tmpfs backed loop device:
> > 
> > xfs_io -f -c "falloc 0 400M" foobar
> > xfs_io -c "pwrite -W -b 4096 0 400M" foobar
> > 
> > Vanilla:
> > wrote 419430400/419430400 bytes at offset 0
> > 400 MiB, 102400 ops; 0:00:01.00 (200.015 MiB/sec and 51203.8403 ops/sec)
> > 
> > Patched kernel:
> > wrote 419430400/419430400 bytes at offset 0
> > 400 MiB, 102400 ops; 0:00:01.00 (346.137 MiB/sec and 88610.9796 ops/sec)
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/file.c | 17 ++++++++---------
> >  1 file changed, 8 insertions(+), 9 deletions(-)
> > 
> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> > index 098bb8f..f66c1bc 100644
> > --- a/fs/btrfs/file.c
> > +++ b/fs/btrfs/file.c
> > @@ -1525,16 +1525,12 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
> >  
> >  		reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
> >  
> > -		if (BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> > -					     BTRFS_INODE_PREALLOC)) {
> > +		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
> > +		if (ret == -ENOSPC &&
> > +		    BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> > +						     BTRFS_INODE_PREALLOC)) {
> >  			ret = check_can_nocow(inode, pos, &write_bytes);
> > -			if (ret < 0)
> > -				break;
> >  			if (ret > 0) {
> > -				/*
> > -				 * For nodata cow case, no need to reserve
> > -				 * data space.
> > -				 */
> >  				only_release_metadata = true;
> >  				/*
> >  				 * our prealloc extent may be smaller than
> > @@ -1543,10 +1539,13 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
> >  				num_pages = DIV_ROUND_UP(write_bytes + offset,
> >  							 PAGE_CACHE_SIZE);
> >  				reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
> > +
> > +				ret = 0;
> >  				goto reserve_metadata;
> > +			} else {
> > +				ret = -ENOSPC;
> >  			}
> >  		}
> > -		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
> >  		if (ret < 0)
> >  			break;
> >  
> 
> 
> -- 
> With respect,
> Roman


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff mbox

Patch

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 098bb8f..f66c1bc 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1525,16 +1525,12 @@  static noinline ssize_t __btrfs_buffered_write(struct file *file,
 
 		reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
 
-		if (BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
-					     BTRFS_INODE_PREALLOC)) {
+		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
+		if (ret == -ENOSPC &&
+		    BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
+						     BTRFS_INODE_PREALLOC)) {
 			ret = check_can_nocow(inode, pos, &write_bytes);
-			if (ret < 0)
-				break;
 			if (ret > 0) {
-				/*
-				 * For nodata cow case, no need to reserve
-				 * data space.
-				 */
 				only_release_metadata = true;
 				/*
 				 * our prealloc extent may be smaller than
@@ -1543,10 +1539,13 @@  static noinline ssize_t __btrfs_buffered_write(struct file *file,
 				num_pages = DIV_ROUND_UP(write_bytes + offset,
 							 PAGE_CACHE_SIZE);
 				reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
+
+				ret = 0;
 				goto reserve_metadata;
+			} else {
+				ret = -ENOSPC;
 			}
 		}
-		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
 		if (ret < 0)
 			break;