fs: don't set *REFERENCED on single use objects

Message ID	1492545857-3695-1-git-send-email-jbacik@fb.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> From: Josef Bacik <josef@toxicpanda.com> To: viro@ZenIV.linux.org.uk, jack@suse.cz, david@fromorbit.com, linux-fsdevel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH] fs: don't set *REFERENCED on single use objects Date: Tue, 18 Apr 2017 16:04:17 -0400 Message-Id: <1492545857-3695-1-git-send-email-jbacik@fb.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk

Message ID

1492545857-3695-1-git-send-email-jbacik@fb.com (mailing list archive)

State

New, archived

Headers

From: Josef Bacik <josef@toxicpanda.com>
To: viro@ZenIV.linux.org.uk, jack@suse.cz, david@fromorbit.com,
	linux-fsdevel@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH] fs: don't set *REFERENCED on single use objects
Date: Tue, 18 Apr 2017 16:04:17 -0400
Message-Id: <1492545857-3695-1-git-send-email-jbacik@fb.com>
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk

Commit Message

Josef Bacik April 18, 2017, 8:04 p.m. UTC

By default we set DCACHE_REFERENCED and I_REFERENCED on any dentry or
inode we create.  This is problematic as this means that it takes two
trips through the LRU for any of these objects to be reclaimed,
regardless of their actual lifetime.  With enough pressure from these
caches we can easily evict our working set from page cache with single
use objects.  So instead only set *REFERENCED if we've already been
added to the LRU list.  This means that we've been touched since the
first time we were accessed, and so more likely to need to hang out in
cache.

To illustrate this issue I wrote the following scripts

https://github.com/josefbacik/debug-scripts/tree/master/cache-pressure

on my test box.  It is a single socket 4 core CPU with 16gib of RAM and
I tested on an Intel 2tib NVME drive.  The cache-pressure.sh script
creates a new file system and creates 2 6.5gib files in order to take up
13gib of the 16gib of ram with pagecache.  Then it runs a test program
that reads these 2 files in a loop, and keeps track of how often it has
to read bytes for each loop.  On an ideal system with no pressure we
should have to read 0 bytes indefinitely.  The second thing this script
does is start a fs_mark job that creates a ton of 0 length files,
putting pressure on the system with slab only allocations.  On exit the
script prints out how many bytes were read by the read-file program.
The results are as follows

Without patch:
/mnt/btrfs-test/reads/file1: total read during loops 27262988288
/mnt/btrfs-test/reads/file2: total read during loops 27262976000

With patch:
/mnt/btrfs-test/reads/file2: total read during loops 18640457728
/mnt/btrfs-test/reads/file1: total read during loops 9565376512

This patch results in a 50% reduction of the amount of pages evicted
from our working set.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/dcache.c | 4 ++--
 fs/inode.c  | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

Comments

Jan Kara April 19, 2017, 7:02 a.m. UTC | #1

On Tue 18-04-17 16:04:17, Josef Bacik wrote:
> By default we set DCACHE_REFERENCED and I_REFERENCED on any dentry or
> inode we create.  This is problematic as this means that it takes two
> trips through the LRU for any of these objects to be reclaimed,
> regardless of their actual lifetime.  With enough pressure from these
> caches we can easily evict our working set from page cache with single
> use objects.  So instead only set *REFERENCED if we've already been
> added to the LRU list.  This means that we've been touched since the
> first time we were accessed, and so more likely to need to hang out in
> cache.
> 
> To illustrate this issue I wrote the following scripts
> 
> https://github.com/josefbacik/debug-scripts/tree/master/cache-pressure
> 
> on my test box.  It is a single socket 4 core CPU with 16gib of RAM and
> I tested on an Intel 2tib NVME drive.  The cache-pressure.sh script
> creates a new file system and creates 2 6.5gib files in order to take up
> 13gib of the 16gib of ram with pagecache.  Then it runs a test program
> that reads these 2 files in a loop, and keeps track of how often it has
> to read bytes for each loop.  On an ideal system with no pressure we
> should have to read 0 bytes indefinitely.  The second thing this script
> does is start a fs_mark job that creates a ton of 0 length files,
> putting pressure on the system with slab only allocations.  On exit the
> script prints out how many bytes were read by the read-file program.
> The results are as follows
> 
> Without patch:
> /mnt/btrfs-test/reads/file1: total read during loops 27262988288
> /mnt/btrfs-test/reads/file2: total read during loops 27262976000
> 
> With patch:
> /mnt/btrfs-test/reads/file2: total read during loops 18640457728
> /mnt/btrfs-test/reads/file1: total read during loops 9565376512
> 
> This patch results in a 50% reduction of the amount of pages evicted
> from our working set.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>

To me this makes sense and I was myself wondering why we add dentries and
inodes to the lru list with referenced bit set in the past. I would just
note that the patch is changing the balance between dentry/inode cache
eviction and page cache eviction (so that pages have lower chance of being
evicted) and your test measures only the benefit of less pages being
evicted. So to be fair you should also complement it with a test where
dentries & inodes actually do get eventually reused and how that works out
in presence of memory pressure from use-once pages in the page cache.

								Honza
> ---
>  fs/dcache.c | 4 ++--
>  fs/inode.c  | 3 ++-
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 95d71ed..cddf397 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -419,6 +419,8 @@ static void dentry_lru_add(struct dentry *dentry)
>  {
>  	if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
>  		d_lru_add(dentry);
> +	else if (unlikely(!(dentry->d_flags & DCACHE_REFERENCED)))
> +		dentry->d_flags |= DCACHE_REFERENCED;
>  }
>  
>  /**
> @@ -779,8 +781,6 @@ void dput(struct dentry *dentry)
>  			goto kill_it;
>  	}
>  
> -	if (!(dentry->d_flags & DCACHE_REFERENCED))
> -		dentry->d_flags |= DCACHE_REFERENCED;
>  	dentry_lru_add(dentry);
>  
>  	dentry->d_lockref.count--;
> diff --git a/fs/inode.c b/fs/inode.c
> index 88110fd..9dfa8f1 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -405,6 +405,8 @@ static void inode_lru_list_add(struct inode *inode)
>  {
>  	if (list_lru_add(&inode->i_sb->s_inode_lru, &inode->i_lru))
>  		this_cpu_inc(nr_unused);
> +	else
> +		inode->i_state |= I_REFERENCED;
>  }
>  
>  /*
> @@ -1492,7 +1494,6 @@ static void iput_final(struct inode *inode)
>  		drop = generic_drop_inode(inode);
>  
>  	if (!drop && (sb->s_flags & MS_ACTIVE)) {
> -		inode->i_state |= I_REFERENCED;
>  		inode_add_lru(inode);
>  		spin_unlock(&inode->i_lock);
>  		return;
> -- 
> 2.7.4
>

Josef Bacik April 19, 2017, 6:02 p.m. UTC | #2

> On Apr 19, 2017, at 3:02 AM, Jan Kara <jack@suse.cz> wrote:
> 
> On Tue 18-04-17 16:04:17, Josef Bacik wrote:
>> By default we set DCACHE_REFERENCED and I_REFERENCED on any dentry or
>> inode we create.  This is problematic as this means that it takes two
>> trips through the LRU for any of these objects to be reclaimed,
>> regardless of their actual lifetime.  With enough pressure from these
>> caches we can easily evict our working set from page cache with single
>> use objects.  So instead only set *REFERENCED if we've already been
>> added to the LRU list.  This means that we've been touched since the
>> first time we were accessed, and so more likely to need to hang out in
>> cache.
>> 
>> To illustrate this issue I wrote the following scripts
>> 
>> https://github.com/josefbacik/debug-scripts/tree/master/cache-pressure
>> 
>> on my test box.  It is a single socket 4 core CPU with 16gib of RAM and
>> I tested on an Intel 2tib NVME drive.  The cache-pressure.sh script
>> creates a new file system and creates 2 6.5gib files in order to take up
>> 13gib of the 16gib of ram with pagecache.  Then it runs a test program
>> that reads these 2 files in a loop, and keeps track of how often it has
>> to read bytes for each loop.  On an ideal system with no pressure we
>> should have to read 0 bytes indefinitely.  The second thing this script
>> does is start a fs_mark job that creates a ton of 0 length files,
>> putting pressure on the system with slab only allocations.  On exit the
>> script prints out how many bytes were read by the read-file program.
>> The results are as follows
>> 
>> Without patch:
>> /mnt/btrfs-test/reads/file1: total read during loops 27262988288
>> /mnt/btrfs-test/reads/file2: total read during loops 27262976000
>> 
>> With patch:
>> /mnt/btrfs-test/reads/file2: total read during loops 18640457728
>> /mnt/btrfs-test/reads/file1: total read during loops 9565376512
>> 
>> This patch results in a 50% reduction of the amount of pages evicted
>> from our working set.
>> 
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
> 
> To me this makes sense and I was myself wondering why we add dentries and
> inodes to the lru list with referenced bit set in the past. I would just
> note that the patch is changing the balance between dentry/inode cache
> eviction and page cache eviction (so that pages have lower chance of being
> evicted) and your test measures only the benefit of less pages being
> evicted. So to be fair you should also complement it with a test where
> dentries & inodes actually do get eventually reused and how that works out
> in presence of memory pressure from use-once pages in the page cache.
> 

Yup that's fair.  I wrote up a different script to exercise the warm icache/dcache vs once use pages, the scripts are in the same place as my original script

https://github.com/josefbacik/debug-scripts/tree/master/cache-pressure

Basically create 3million empty nodes, create a 100gib sparse file, stat in a loop all the files while cat'ing the sparse file to /dev/null.  There's a bcc script in there to count how many times the stat'ing programs allocate a new inode, and with and without my patch the test exited with those processes creating 0 nodes.  This makes sense as we will create the files and they will not have REFERENCED, and then as soon as the files are read in again they will be marked with REFERENCED, so they become just as hard to remove as they were originally.  Thanks,

Josef

Jan Kara April 20, 2017, 11:56 a.m. UTC | #3

On Wed 19-04-17 14:02:56, Josef Bacik wrote:
> 
> > On Apr 19, 2017, at 3:02 AM, Jan Kara <jack@suse.cz> wrote:
> > 
> > On Tue 18-04-17 16:04:17, Josef Bacik wrote:
> >> By default we set DCACHE_REFERENCED and I_REFERENCED on any dentry or
> >> inode we create.  This is problematic as this means that it takes two
> >> trips through the LRU for any of these objects to be reclaimed,
> >> regardless of their actual lifetime.  With enough pressure from these
> >> caches we can easily evict our working set from page cache with single
> >> use objects.  So instead only set *REFERENCED if we've already been
> >> added to the LRU list.  This means that we've been touched since the
> >> first time we were accessed, and so more likely to need to hang out in
> >> cache.
> >> 
> >> To illustrate this issue I wrote the following scripts
> >> 
> >> https://github.com/josefbacik/debug-scripts/tree/master/cache-pressure
> >> 
> >> on my test box.  It is a single socket 4 core CPU with 16gib of RAM
> >> and I tested on an Intel 2tib NVME drive.  The cache-pressure.sh
> >> script creates a new file system and creates 2 6.5gib files in order
> >> to take up 13gib of the 16gib of ram with pagecache.  Then it runs a
> >> test program that reads these 2 files in a loop, and keeps track of
> >> how often it has to read bytes for each loop.  On an ideal system with
> >> no pressure we should have to read 0 bytes indefinitely.  The second
> >> thing this script does is start a fs_mark job that creates a ton of 0
> >> length files, putting pressure on the system with slab only
> >> allocations.  On exit the script prints out how many bytes were read
> >> by the read-file program.  The results are as follows
> >> 
> >> Without patch: /mnt/btrfs-test/reads/file1: total read during loops
> >> 27262988288 /mnt/btrfs-test/reads/file2: total read during loops
> >> 27262976000
> >> 
> >> With patch: /mnt/btrfs-test/reads/file2: total read during loops
> >> 18640457728 /mnt/btrfs-test/reads/file1: total read during loops
> >> 9565376512
> >> 
> >> This patch results in a 50% reduction of the amount of pages evicted
> >> from our working set.
> >> 
> >> Signed-off-by: Josef Bacik <jbacik@fb.com>
> > 
> > To me this makes sense and I was myself wondering why we add dentries
> > and inodes to the lru list with referenced bit set in the past. I would
> > just note that the patch is changing the balance between dentry/inode
> > cache eviction and page cache eviction (so that pages have lower chance
> > of being evicted) and your test measures only the benefit of less pages
> > being evicted. So to be fair you should also complement it with a test
> > where dentries & inodes actually do get eventually reused and how that
> > works out in presence of memory pressure from use-once pages in the
> > page cache.
> > 
> 
> Yup that's fair.  I wrote up a different script to exercise the warm
> icache/dcache vs once use pages, the scripts are in the same place as my
> original script
> 
> https://github.com/josefbacik/debug-scripts/tree/master/cache-pressure
> 
> Basically create 3million empty nodes, create a 100gib sparse file, stat
> in a loop all the files while cat'ing the sparse file to /dev/null.
> There's a bcc script in there to count how many times the stat'ing
> programs allocate a new inode, and with and without my patch the test
> exited with those processes creating 0 nodes.  This makes sense as we
> will create the files and they will not have REFERENCED, and then as soon
> as the files are read in again they will be marked with REFERENCED, so
> they become just as hard to remove as they were originally.

OK, sounds good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza



--
Jan Kara <jack@suse.com>
SUSE Labs, CR

Josef Bacik April 27, 2017, 7:43 p.m. UTC | #4

On Tue, Apr 18, 2017 at 04:04:17PM -0400, Josef Bacik wrote:
> By default we set DCACHE_REFERENCED and I_REFERENCED on any dentry or
> inode we create.  This is problematic as this means that it takes two
> trips through the LRU for any of these objects to be reclaimed,
> regardless of their actual lifetime.  With enough pressure from these
> caches we can easily evict our working set from page cache with single
> use objects.  So instead only set *REFERENCED if we've already been
> added to the LRU list.  This means that we've been touched since the
> first time we were accessed, and so more likely to need to hang out in
> cache.
> 
> To illustrate this issue I wrote the following scripts
> 
> https://github.com/josefbacik/debug-scripts/tree/master/cache-pressure
> 
> on my test box.  It is a single socket 4 core CPU with 16gib of RAM and
> I tested on an Intel 2tib NVME drive.  The cache-pressure.sh script
> creates a new file system and creates 2 6.5gib files in order to take up
> 13gib of the 16gib of ram with pagecache.  Then it runs a test program
> that reads these 2 files in a loop, and keeps track of how often it has
> to read bytes for each loop.  On an ideal system with no pressure we
> should have to read 0 bytes indefinitely.  The second thing this script
> does is start a fs_mark job that creates a ton of 0 length files,
> putting pressure on the system with slab only allocations.  On exit the
> script prints out how many bytes were read by the read-file program.
> The results are as follows
> 
> Without patch:
> /mnt/btrfs-test/reads/file1: total read during loops 27262988288
> /mnt/btrfs-test/reads/file2: total read during loops 27262976000
> 
> With patch:
> /mnt/btrfs-test/reads/file2: total read during loops 18640457728
> /mnt/btrfs-test/reads/file1: total read during loops 9565376512
> 
> This patch results in a 50% reduction of the amount of pages evicted
> from our working set.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>

Hello Al,

Can we get this included in the upcoming merge window?  Thanks,

Josef

diff --git a/fs/dcache.c b/fs/dcache.c
index 95d71ed..cddf397 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -419,6 +419,8 @@  static void dentry_lru_add(struct dentry *dentry)
 {
 	if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
 		d_lru_add(dentry);
+	else if (unlikely(!(dentry->d_flags & DCACHE_REFERENCED)))
+		dentry->d_flags |= DCACHE_REFERENCED;
 }
 
 /**
@@ -779,8 +781,6 @@  void dput(struct dentry *dentry)
 			goto kill_it;
 	}
 
-	if (!(dentry->d_flags & DCACHE_REFERENCED))
-		dentry->d_flags |= DCACHE_REFERENCED;
 	dentry_lru_add(dentry);
 
 	dentry->d_lockref.count--;
diff --git a/fs/inode.c b/fs/inode.c
index 88110fd..9dfa8f1 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -405,6 +405,8 @@  static void inode_lru_list_add(struct inode *inode)
 {
 	if (list_lru_add(&inode->i_sb->s_inode_lru, &inode->i_lru))
 		this_cpu_inc(nr_unused);
+	else
+		inode->i_state |= I_REFERENCED;
 }
 
 /*
@@ -1492,7 +1494,6 @@  static void iput_final(struct inode *inode)
 		drop = generic_drop_inode(inode);
 
 	if (!drop && (sb->s_flags & MS_ACTIVE)) {
-		inode->i_state |= I_REFERENCED;
 		inode_add_lru(inode);
 		spin_unlock(&inode->i_lock);
 		return;

fs: don't set *REFERENCED on single use objects

Commit Message

Comments

Patch