diff mbox

sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race

Message ID 87y49ylq76.fsf@notabene.neil.brown.name (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown March 4, 2016, 6:20 a.m. UTC
sunrpc_cache_pipe_upcall() can detect a race if CACHE_PENDING is no longer
set.  In this case it aborts the queuing of the upcall.
However it has already taken a new counted reference on "h" and
doesn't "put" it, even though it frees the data structure holding the reference.

So let's delay the "cache_get" until we know we need it.

Fixes: f9e1aedc6c79 ("sunrpc/cache: remove races with queuing an upcall.")
Signed-off-by: NeilBrown <neilb@suse.com>
---
 net/sunrpc/cache.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

I found this when I was looking for something else.  Testing hasn't
shown a bug, and nor has it shown that this is bug-free.  But it looks
right.

NeilBrown

Comments

J. Bruce Fields March 14, 2016, 8:38 p.m. UTC | #1
On Fri, Mar 04, 2016 at 05:20:13PM +1100, NeilBrown wrote:
> 
> sunrpc_cache_pipe_upcall() can detect a race if CACHE_PENDING is no longer
> set.  In this case it aborts the queuing of the upcall.
> However it has already taken a new counted reference on "h" and
> doesn't "put" it, even though it frees the data structure holding the reference.
> 
> So let's delay the "cache_get" until we know we need it.
> 
> Fixes: f9e1aedc6c79 ("sunrpc/cache: remove races with queuing an upcall.")
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  net/sunrpc/cache.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> I found this when I was looking for something else.  Testing hasn't
> shown a bug, and nor has it shown that this is bug-free.  But it looks
> right.

Sorry for the delay.  I agree, it seems simple enough; applying for
4.6....

--b.

> 
> NeilBrown
> 
> 
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index 273bc3a35425..008c25d1b9f9 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -1182,14 +1182,14 @@ int sunrpc_cache_pipe_upcall(struct cache_detail *detail, struct cache_head *h)
>  	}
>  
>  	crq->q.reader = 0;
> -	crq->item = cache_get(h);
>  	crq->buf = buf;
>  	crq->len = 0;
>  	crq->readers = 0;
>  	spin_lock(&queue_lock);
> -	if (test_bit(CACHE_PENDING, &h->flags))
> +	if (test_bit(CACHE_PENDING, &h->flags)) {
> +		crq->item = cache_get(h);
>  		list_add_tail(&crq->q.list, &detail->queue);
> -	else
> +	} else
>  		/* Lost a race, no longer PENDING, so don't enqueue */
>  		ret = -EAGAIN;
>  	spin_unlock(&queue_lock);
> -- 
> 2.7.0
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
William Dauchy March 15, 2016, 12:43 p.m. UTC | #2
Hello,

On Mon, Mar 14, 2016 at 9:38 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> Sorry for the delay.  I agree, it seems simple enough; applying for
> 4.6....

It it something we can consider for stable?

Thanks,
J. Bruce Fields March 15, 2016, 1:19 p.m. UTC | #3
On Tue, Mar 15, 2016 at 01:43:37PM +0100, William Dauchy wrote:
> Hello,
> 
> On Mon, Mar 14, 2016 at 9:38 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > Sorry for the delay.  I agree, it seems simple enough; applying for
> > 4.6....
> 
> It it something we can consider for stable?

It's an unlikely-looking race, I haven't seen it reported in the wild,
and the consequences are just a reference-count leak (though admittedly
those can sometimes cause crashes later--I haven't checked for that
here).

So I wasn't planning too, but any additional information's welcomed.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
NeilBrown March 15, 2016, 8:37 p.m. UTC | #4
On Wed, Mar 16 2016, J. Bruce Fields wrote:

> On Tue, Mar 15, 2016 at 01:43:37PM +0100, William Dauchy wrote:
>> Hello,
>> 
>> On Mon, Mar 14, 2016 at 9:38 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>> > Sorry for the delay.  I agree, it seems simple enough; applying for
>> > 4.6....
>> 
>> It it something we can consider for stable?
>
> It's an unlikely-looking race, I haven't seen it reported in the wild,
> and the consequences are just a reference-count leak (though admittedly
> those can sometimes cause crashes later--I haven't checked for that
> here).
>
> So I wasn't planning too, but any additional information's welcomed.

I agree that it isn't needed for stable - the race is tiny and the
consequence of losing the race is that entries get stuck in the cache
and possible an exported filesystem cannot be unmounted.  I don't think
that is a problem severe enough to meet the rules for -stable - though I
know the rules aren't always closely followed.

I wouldn't exactly object to it going to stable if someone wanted to
push, but I would rather it didn't appear in -stable before appearing in
a released (not -rc) kernel from Linus.

NeilBrown
William Dauchy March 16, 2016, 7:31 a.m. UTC | #5
Hi Bruce,

Thank you for your reply.

On Tue, Mar 15, 2016 at 9:37 PM, NeilBrown <neilb@suse.com> wrote:
> I agree that it isn't needed for stable - the race is tiny and the
> consequence of losing the race is that entries get stuck in the cache
> and possible an exported filesystem cannot be unmounted.

I believe in an industrial usage, this race will become more common;
moreover the consequence of an exported filesystem which can't be
unmounted is not something I can accept in some environment.

I accept your judgment about including it or not in -stable but I am
just complaining about the lack of further consideration for those
type of fixes in nfs generally speaking. In an industrial usage, we
are hitting races way more easily than in a normal usage; lost of them
are already fixed and we do backports ourself. We are at a point where
it becomes almost impossible to come with a proof and say "we are
hitting this race, and this patch fixes the issue, please backport in
-stable".
J. Bruce Fields March 16, 2016, 3:05 p.m. UTC | #6
On Wed, Mar 16, 2016 at 08:31:03AM +0100, William Dauchy wrote:
> Hi Bruce,
> 
> Thank you for your reply.
> 
> On Tue, Mar 15, 2016 at 9:37 PM, NeilBrown <neilb@suse.com> wrote:
> > I agree that it isn't needed for stable - the race is tiny and the
> > consequence of losing the race is that entries get stuck in the cache
> > and possible an exported filesystem cannot be unmounted.
> 
> I believe in an industrial usage, this race will become more common;

Why is that?

> moreover the consequence of an exported filesystem which can't be
> unmounted is not something I can accept in some environment.
> 
> I accept your judgment about including it or not in -stable but I am
> just complaining about the lack of further consideration for those
> type of fixes in nfs generally speaking. In an industrial usage, we
> are hitting races way more easily than in a normal usage; lost of them
> are already fixed and we do backports ourself.

Do you have a list?

> We are at a point where
> it becomes almost impossible to come with a proof and say "we are
> hitting this race, and this patch fixes the issue, please backport in
> -stable".

I don't expect *proof*, necessarily, but some sort of argument would be
helpful.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
William Dauchy March 16, 2016, 3:38 p.m. UTC | #7
Hi Bruce,

Thank your for your reply.

On Wed, Mar 16, 2016 at 4:05 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> Why is that?

It's indeed not really based on actual proof but based on the experience we had.
Faced several bugs, and it appears we had less issues by backporting a
few commits; e.g related to races, leaks, performances regressions. We
did not spent enough time on it to actually show this commit was
actually fixing an issue hit.

> Do you have a list?

I am currently trying to switch from v3.14 to v4.1 (nfs client). So
even know it is off topic, those I have in mind:
0d2a970 SUNRPC: Fix a backchannel race
6851447 SUNRPC: Fix a backchannel deadlock
0993920 SUNRPC: Prevent SYN+SYNACK+RST storms
03c78827 SUNRPC: Fix races between socket connection and destroy code
a41cbe8 Failing to send a CLOSE if file is opened WRONLY and server
reboots on a 4.x mount
39d0d3b NFS: Fix a tracepoint NULL-pointer dereference
5e99b53 nfs4: reset states to use open_stateid when returning
delegation voluntarily
e92c1e0 NFSv4: Fix a nograce recovery hang
6b55970 nfs: Fix a memory leak when meeting an unsupported state protect

I am probably wrong on most of them but we had better stability after
applying them and I probably forgot some.

> I don't expect *proof*, necessarily, but some sort of argument would be
> helpful.

In my environment, if I hit this race, it is not acceptable to not
being able to unmount the filesystem.

Thanks,
J. Bruce Fields March 16, 2016, 4 p.m. UTC | #8
On Wed, Mar 16, 2016 at 04:38:55PM +0100, William Dauchy wrote:
> Hi Bruce,
> 
> Thank your for your reply.
> 
> On Wed, Mar 16, 2016 at 4:05 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > Why is that?
> 
> It's indeed not really based on actual proof but based on the experience we had.
> Faced several bugs, and it appears we had less issues by backporting a
> few commits; e.g related to races, leaks, performances regressions. We
> did not spent enough time on it to actually show this commit was
> actually fixing an issue hit.
> 
> > Do you have a list?
> 
> I am currently trying to switch from v3.14 to v4.1 (nfs client). So
> even know it is off topic, those I have in mind:
> 0d2a970 SUNRPC: Fix a backchannel race
> 6851447 SUNRPC: Fix a backchannel deadlock
> 0993920 SUNRPC: Prevent SYN+SYNACK+RST storms
> 03c78827 SUNRPC: Fix races between socket connection and destroy code
> a41cbe8 Failing to send a CLOSE if file is opened WRONLY and server
> reboots on a 4.x mount
> 39d0d3b NFS: Fix a tracepoint NULL-pointer dereference
> 5e99b53 nfs4: reset states to use open_stateid when returning
> delegation voluntarily
> e92c1e0 NFSv4: Fix a nograce recovery hang
> 6b55970 nfs: Fix a memory leak when meeting an unsupported state protect
> 
> I am probably wrong on most of them but we had better stability after
> applying them and I probably forgot some.

Note options 2 and 3 in Documentation/stable_kernel_rules.txt.  You can
send a backport to stable@vger.kernel.org yourself.  When you do so,
please cc: the relevant maintainers (looks like Trond and Anna for the
above) and linux-nfs@vger.kernel.org, so they can ACK/NACK.

> > I don't expect *proof*, necessarily, but some sort of argument would be
> > helpful.
> 
> In my environment, if I hit this race, it is not acceptable to not
> being able to unmount the filesystem.

OK.  I'd be interested in your use case if you get the chance.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
William Dauchy March 16, 2016, 4:16 p.m. UTC | #9
On Wed, Mar 16, 2016 at 5:00 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> OK.  I'd be interested in your use case if you get the chance.

Well, in our case, we have temporary instances which mount the needed
filesystem in order to do temporary work and clean it before leaving.
In other cases, there is only one connection by nfs server and all the
filesystems are bind to it.

Thanks,
NeilBrown March 18, 2016, 6:05 a.m. UTC | #10
On Thu, Mar 17 2016, William Dauchy wrote:

> Hi Bruce,
>
> Thank your for your reply.
>
> On Wed, Mar 16, 2016 at 4:05 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>> Why is that?
>
> It's indeed not really based on actual proof but based on the experience we had.
> Faced several bugs, and it appears we had less issues by backporting a
> few commits; e.g related to races, leaks, performances regressions. We
> did not spent enough time on it to actually show this commit was
> actually fixing an issue hit.
>
>> Do you have a list?
>
> I am currently trying to switch from v3.14 to v4.1 (nfs client). So
> even know it is off topic, those I have in mind:
> 0d2a970 SUNRPC: Fix a backchannel race
> 6851447 SUNRPC: Fix a backchannel deadlock
> 0993920 SUNRPC: Prevent SYN+SYNACK+RST storms
> 03c78827 SUNRPC: Fix races between socket connection and destroy code
> a41cbe8 Failing to send a CLOSE if file is opened WRONLY and server
> reboots on a 4.x mount
> 39d0d3b NFS: Fix a tracepoint NULL-pointer dereference
> 5e99b53 nfs4: reset states to use open_stateid when returning
> delegation voluntarily
> e92c1e0 NFSv4: Fix a nograce recovery hang
> 6b55970 nfs: Fix a memory leak when meeting an unsupported state protect

Commit: 7632e465feb1 ("dcache: use IS_ROOT to decide where dentry is hashed")

is my most recent complaint - it really should have been tagged for
-stable.  It is now in 3.12-stable, over 2 years later :-(

NeilBrown


>
> I am probably wrong on most of them but we had better stability after
> applying them and I probably forgot some.
>
>> I don't expect *proof*, necessarily, but some sort of argument would be
>> helpful.
>
> In my environment, if I hit this race, it is not acceptable to not
> being able to unmount the filesystem.
>
> Thanks,
> -- 
> William
diff mbox

Patch

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 273bc3a35425..008c25d1b9f9 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1182,14 +1182,14 @@  int sunrpc_cache_pipe_upcall(struct cache_detail *detail, struct cache_head *h)
 	}
 
 	crq->q.reader = 0;
-	crq->item = cache_get(h);
 	crq->buf = buf;
 	crq->len = 0;
 	crq->readers = 0;
 	spin_lock(&queue_lock);
-	if (test_bit(CACHE_PENDING, &h->flags))
+	if (test_bit(CACHE_PENDING, &h->flags)) {
+		crq->item = cache_get(h);
 		list_add_tail(&crq->q.list, &detail->queue);
-	else
+	} else
 		/* Lost a race, no longer PENDING, so don't enqueue */
 		ret = -EAGAIN;
 	spin_unlock(&queue_lock);