diff mbox

[8/8] NFS: allow lockless access to access_cache

Message ID 20140305030028.27421.87633.stgit@notabene.brown (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown March 5, 2014, 3 a.m. UTC
The access cache is used during RCU-walk path lookups, so it is best
to avoid locking if possible as taking a lock kills concurrency.

The rbtree is not rcu-safe and cannot easily be made so.
Instead we simply check the last (i.e. most recent) entry on the LRU
list.  If this doesn't match, then we return -ECHILD and retry in
lock/refcount mode.

This requires freeing the nfs_access_entry struct with rcu, and
requires using rcu access primates when adding entries to the lru, and
when examining the last entry.

Calling put_rpccred before kfree_rcu looks a bit odd, but as
put_rpccred already provide rcu protection, we know that the cred will
not actually be freed until the next grace period, so any concurrent
access will be safe.

This patch provides about 5% performance improvement on a stat-heavy
synthetic work load with 4 threads on a 2-core CPU.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/dir.c           |   41 +++++++++++++++++++++++++++++++++++++++--
 include/linux/nfs_fs.h |    1 +
 2 files changed, 40 insertions(+), 2 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Trond Myklebust March 5, 2014, 3:11 p.m. UTC | #1
On Mar 4, 2014, at 22:00, NeilBrown <neilb@suse.de> wrote:

> The access cache is used during RCU-walk path lookups, so it is best
> to avoid locking if possible as taking a lock kills concurrency.
> 
> The rbtree is not rcu-safe and cannot easily be made so.
> Instead we simply check the last (i.e. most recent) entry on the LRU
> list.  If this doesn't match, then we return -ECHILD and retry in
> lock/refcount mode.
> 
> This requires freeing the nfs_access_entry struct with rcu, and
> requires using rcu access primates when adding entries to the lru, and
> when examining the last entry.
> 
> Calling put_rpccred before kfree_rcu looks a bit odd, but as
> put_rpccred already provide rcu protection, we know that the cred will
> not actually be freed until the next grace period, so any concurrent
> access will be safe.
> 
> This patch provides about 5% performance improvement on a stat-heavy
> synthetic work load with 4 threads on a 2-core CPU.
> 

Have you looked at converting the rb-tree into something a little more RCU-friendly? Since our lookup key is just a pointer value, couldn’t we use a radix-tree?
NeilBrown March 6, 2014, 6:03 a.m. UTC | #2
On Wed, 5 Mar 2014 10:11:02 -0500 Trond Myklebust
<trond.myklebust@primarydata.com> wrote:

> 
> On Mar 4, 2014, at 22:00, NeilBrown <neilb@suse.de> wrote:
> 
> > The access cache is used during RCU-walk path lookups, so it is best
> > to avoid locking if possible as taking a lock kills concurrency.
> > 
> > The rbtree is not rcu-safe and cannot easily be made so.
> > Instead we simply check the last (i.e. most recent) entry on the LRU
> > list.  If this doesn't match, then we return -ECHILD and retry in
> > lock/refcount mode.
> > 
> > This requires freeing the nfs_access_entry struct with rcu, and
> > requires using rcu access primates when adding entries to the lru, and
> > when examining the last entry.
> > 
> > Calling put_rpccred before kfree_rcu looks a bit odd, but as
> > put_rpccred already provide rcu protection, we know that the cred will
> > not actually be freed until the next grace period, so any concurrent
> > access will be safe.
> > 
> > This patch provides about 5% performance improvement on a stat-heavy
> > synthetic work load with 4 threads on a 2-core CPU.
> > 
> 
> Have you looked at converting the rb-tree into something a little more RCU-friendly? Since our lookup key is just a pointer value, couldn’t we use a radix-tree?
> 

Interesting idea.  The radix tree would be fairly sparse unless we divided
the address by the size of the structure ... would that hit alignment issues?
But we don't know real size of the structure (it can depend on auth scheme)
so I doubt that would work.  So I think a radix tree would waste a lot of
space.

Skip lists might work https://lwn.net/Articles/551896/ but they don't appear
to be in the kernel yet.

I think my current approach is no worse the existing code and while there is
room for further improvement, that should probably wait until a suitable data
structure is available.

Thanks,
NeilBrown
diff mbox

Patch

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index e808db561201..edf0458acd4a 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2011,7 +2011,7 @@  static atomic_long_t nfs_access_nr_entries;
 static void nfs_access_free_entry(struct nfs_access_entry *entry)
 {
 	put_rpccred(entry->cred);
-	kfree(entry);
+	kfree_rcu(entry, rcu_head);
 	smp_mb__before_atomic_dec();
 	atomic_long_dec(&nfs_access_nr_entries);
 	smp_mb__after_atomic_dec();
@@ -2166,6 +2166,36 @@  out_zap:
 	return -ENOENT;
 }
 
+static int nfs_access_get_cached_rcu(struct inode *inode, struct rpc_cred *cred, struct nfs_access_entry *res)
+{
+	/* Only check the most recently returned cache entry,
+	 * but do it without locking.
+	 */
+	struct nfs_inode *nfsi = NFS_I(inode);
+	struct nfs_access_entry *cache;
+	int err = -ECHILD;
+	struct list_head *lh;
+
+	if (nfsi->cache_validity & NFS_INO_INVALID_ACCESS)
+		goto out;
+	lh = rcu_dereference(nfsi->access_cache_entry_lru.prev);
+	cache = list_entry(lh, struct nfs_access_entry, lru);
+	if (lh == &nfsi->access_cache_entry_lru ||
+	    cred != cache->cred)
+		cache = NULL;
+	if (cache == NULL)
+		goto out;
+	if (!nfs_have_delegated_attributes(inode) &&
+	    !time_in_range_open(jiffies, cache->jiffies, cache->jiffies + nfsi->attrtimeo))
+		goto out;
+	res->jiffies = cache->jiffies;
+	res->cred = cache->cred;
+	res->mask = cache->mask;
+	err = 0;
+out:
+	return err;
+}
+
 static void nfs_access_add_rbtree(struct inode *inode, struct nfs_access_entry *set)
 {
 	struct nfs_inode *nfsi = NFS_I(inode);
@@ -2209,6 +2239,11 @@  void nfs_access_add_cache(struct inode *inode, struct nfs_access_entry *set)
 	cache->cred = get_rpccred(set->cred);
 	cache->mask = set->mask;
 
+	/* The above fields assignments must be visible
+	 * before this item appears on the lru.  We cannot easily
+	 * use rcu_assign_pointer, so just force the memory barrier.
+	 */
+	smp_wmb();
 	nfs_access_add_rbtree(inode, cache);
 
 	/* Update accounting */
@@ -2247,7 +2282,9 @@  static int nfs_do_access(struct inode *inode, struct rpc_cred *cred, int mask)
 
 	trace_nfs_access_enter(inode);
 
-	status = nfs_access_get_cached(inode, cred, &cache);
+	if (!(mask & MAY_NOT_BLOCK) ||
+	    (status = nfs_access_get_cached_rcu(inode, cred, &cache)) != 0)
+		status = nfs_access_get_cached(inode, cred, &cache);
 	if (status == 0)
 		goto out_cached;
 
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 9e91f2cfcd01..0eaf6c48c826 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -52,6 +52,7 @@  struct nfs_access_entry {
 	unsigned long		jiffies;
 	struct rpc_cred *	cred;
 	int			mask;
+	struct rcu_head		rcu_head;
 };
 
 struct nfs_lockowner {