diff mbox

sunrpc/cache.c: races while updating cache entries

Message ID 20130319142342.45fddbf1@notabene.brown (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown March 19, 2013, 3:23 a.m. UTC
On 14 Mar 2013 18:31:38 +0100 Bodo Stroesser <bstroesser@ts.fujitsu.com>
wrote:

> On 13 Mar 2013 07:55:00 +0100 NeilBrown <neilb@suse.de> wrote:
> 
> > On 11 Mar 2013 17:13:41 +0100 Bodo Stroesser <bstroesser@ts.fujitsu.com>
> > wrote:
> > 
> > > Hi,
> > > 
> > > AFAICS, there is one more race in RPC cache.
> > > 
> > > The problem showed up for the "auth.unix.ip" cache, that
> > > has a reader.
> > > 
> > > If a server thread tries to find a cache entry, it first
> > > does a lookup (or calls ip_map_cached_get() in this specific
> > > case). Then, it calls cache_check() for this entry.
> > > 
> > > If the reader updates the cache entry just between the
> > > thread's lookup and cache_check() execution, cache_check()
> > > will do an upcall for this cache entry. This is, because
> > > sunrpc_cache_update() calls cache_fresh_locked(old, 0),
> > > which sets expiry_time to 0.
> > > 
> > > Unfortunately, the reply to the upcall will not dequeue
> > > the queued cache_request, as the reply will be assigned to
> > > the cache entry newly created by the above cache update.
> > > 
> > > The result is a growing queue of orphaned cache_request
> > > structures --> memory leak.
> > > 
> > > I found this on a SLES11 SP1 with a backport of the latest
> > > patches that fix the other RPC races. On this old kernel,
> > > the problem also leads to svc_drop() being called for the
> > > affected RPC request (after svc_defer()).
> > > 
> > > Best Regards
> > > Bodo
> > 
> > I don't think this is a real problem.
> > The periodic call to "cache_clean" should find these orphaned requests and
> > purge them.  So you could get a short term memory leak, but it should
> > correct itself.
> > Do you agree?
> > 
> > Thanks,
> > NeilBrown
> 
> Meanwhile I found the missing part of the race!
> 
> It's just what I wrote above but additionally, immediately after
> the reader updated the cache, cache_clean() unhashes the old cache
> entry. In that case:
> 1) the thread will queue a request for a cache entry, that isn't
>    in the hash lists.
> 2) cache_clean() never will access this old cache entry again
> 3) every further cache update will call cache_dequeue() for a newer
>    cache entry, the request is never dequeued
> 
> --> memory leak.

Yes, I see that now.  Thanks.

I suspect that bug was introduced by commit 3af4974eb2c7867d6e16.
Prior to then, cache_clean would never remove anything with an active
reference.  If something was about to get CACHE_PENDING, it would have a
reference so cache_clean would leave it alone.

I wanted to fix this by getting the last 'put' to call cache_dequeue() if
CACHE_PENDING was still set.  However I couldn't get access to the
CACHE_PENDING flag and the cache_detail at the same place - so I gave up on
that.

The patch I have included below sets a flag when an cache item is removed
from the cache (by cache_clean) and refuses to lodge an upcall if the flag is
set.  That will ensure there is never a pending upcall when the item is
finally freed.

You patch only addresses the particular situation that you had found.  I
think it is possible that there might be other situations that also lead to
memory leaks, so I wanted a solution that would guarantee that there couldn't
be a leak.

> 
> I verified this inserting some debug instructions. In a testrun
> that triggered 6 times, and the queue printed by crash after the
> run had 6 orphaned entries.
> 
> As I wrote yesterday, I have a patch that solves the problem by
> retesting the state of the cache entry after setting CACHE_PENDING
> in cache_check(). I can send it if you like.
> 
> Bodo
> 
> P.S.:
> Maybe I'm wrong, but AFAICS, there are two further minor problems
> regarding (nearly) expired cache entries:
> a) ip_map_cached_get() in some situations can return an already
>    outdated (it uses cache_valid that not fully checks the state)
>    cache entry. If cache_check() is caclled for that entry, it will
>    fail, I think
> b) Generally, if a cache entry is returned by sunrpc_cache_lookup()
>    and the entry is in the last second of it's expiry_time, the 
>    clock might move to the next second before cache_check is called.
>    If this happens, cache_check will fail, I think.
> Do you agree?

Yes, but I'm not sure how important this is.
Normally cache entries should be refreshed well before they expire, so we
should normally find an entry with more than half of its lifetime left.

In the rare case where the scenario you describe happens, cache_check() will
return -ETIMEDOUT.
In mainline, that will cause the request to be dropped and the connection
closed so the client will try again and won't hit any race and so should get
a correct result.
In SLES11SP1, we retry the lookup and will hopefully get a better result
without having to close the connection.

So this should be rare and should fail-safe.

Does that agree with your understanding?


Thanks,
NeilBrown


From e76e6583405a3b5ff9c8d2ae1184704efb209ef0 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Tue, 19 Mar 2013 13:58:58 +1100
Subject: [PATCH] sunrpc/cache: ensure items removed from cache do not have
 pending upcalls.

It is possible for a race to set CACHE_PENDING after cache_clean()
has removed a cache entry from the cache.
If CACHE_PENDING is still set when the entry is finally 'put',
the cache_dequeue() will never happen and we can leak memory.

So set a new flag 'CACHE_CLEANED' when we remove something from
the cache, and don't queue and upcall if it is set.

If CACHE_PENDING is set before CACHE_CLEANED, the call that
cache_clean() makes to cache_fresh_unlocked() will free memory
as needed.  If CACHE_PENDING is set after CACHE_CLEANED, the
test in sunrpc_cache_pipe_upcall will ensure that the memory
is not allocated.

Reported-by: <bstroesser@ts.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
diff mbox

Patch

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 303399b..8419f7d 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -57,6 +57,7 @@  struct cache_head {
 #define	CACHE_VALID	0	/* Entry contains valid data */
 #define	CACHE_NEGATIVE	1	/* Negative entry - there is no match for the key */
 #define	CACHE_PENDING	2	/* An upcall has been sent but no reply received yet*/
+#define	CACHE_CLEANED	3	/* Entry has been cleaned from cache */
 
 #define	CACHE_NEW_EXPIRY 120	/* keep new things pending confirmation for 120 seconds */
 
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 1677cfe..8e7ccbb 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -306,7 +306,7 @@  EXPORT_SYMBOL_GPL(cache_check);
  * a current pointer into that list and into the table
  * for that entry.
  *
- * Each time clean_cache is called it finds the next non-empty entry
+ * Each time cache_clean is called it finds the next non-empty entry
  * in the current table and walks the list in that entry
  * looking for entries that can be removed.
  *
@@ -453,6 +453,7 @@  static int cache_clean(void)
 			current_index ++;
 		spin_unlock(&cache_list_lock);
 		if (ch) {
+			set_bit(CACHE_CLEANED, &ch->flags);
 			cache_fresh_unlocked(ch, d);
 			cache_put(ch, d);
 		}
@@ -1176,6 +1177,9 @@  int sunrpc_cache_pipe_upcall(struct cache_detail *detail, struct cache_head *h)
 		warn_no_listener(detail);
 		return -EINVAL;
 	}
+	if (test_bit(CACHE_CLEANED, &h->flags))
+		/* Too late to make an upcall */
+		return -EAGAIN;
 
 	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
 	if (!buf)