diff mbox

kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!

Message ID 4FA345DA4F4AE44899BD2B03EEEC2FA9091FFA38@SACEXCMBX04-PRD.hq.netapp.com (mailing list archive)
State New, archived
Headers show

Commit Message

Trond Myklebust Oct. 13, 2012, 4:42 a.m. UTC
On Fri, 2012-10-12 at 19:56 -0700, Larry McVoy wrote:
> On Sat, Oct 13, 2012 at 02:52:51AM +0000, Myklebust, Trond wrote:

> > On Fri, 2012-10-12 at 19:31 -0700, Larry McVoy wrote:

> > > As I've said, debugging this is going to be hard, it means stopping my

> > > company.  We can do one more crash, you guys can tell me what to do,

> > > but that's about it.

> > 

> > Do you have a reproducer? 

> 

> Oh yeah, it happens every time.  Run skype on a mac w/ your home dir NFS

> mounted.  Skype tries to do some lock operation and the machine panics.

> 

> We've done it 4 times in a row to be sure we weren't smoking crack.  No

> crack.  Just panics.


OK. I think I can see what is happening. We need to filter the return
values from nlm_lookup_file() when dealing with NLMv1 and NLMv3.

Can you please try the attached patch?

Cheers
  Trond

PS: you are presumably running NFSv2 on your Macs. Odd that they should
default to that...
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

Comments

Larry McVoy Oct. 14, 2012, 1:42 a.m. UTC | #1
> PS: you are presumably running NFSv2 on your Macs. Odd that they should
> default to that...

I dunno if they default to that or we force that.  We have had lots of
problems with Linux NFS, we export our home directories but they are
more or less read only.

Maybe things have gotten better but back in the day I could crash the
kernel with a bk clone to a NFS directory.

We've also had problems with NFS versions, 2 has been stable for us so
we tend to use that when we can.

If you want me to get a rant about how retarded the various NFS versions
are I can go there.  The Sun NFS people absolutely positively did not
understand what RPC version numbers are for.  And they invented them.

Sigh.
J. Bruce Fields Oct. 15, 2012, 12:43 a.m. UTC | #2
On Sat, Oct 13, 2012 at 06:42:15PM -0700, Larry McVoy wrote:
> > PS: you are presumably running NFSv2 on your Macs. Odd that they should
> > default to that...
> 
> I dunno if they default to that or we force that.  We have had lots of
> problems with Linux NFS, we export our home directories but they are
> more or less read only.
> 
> Maybe things have gotten better but back in the day I could crash the
> kernel with a bk clone to a NFS directory.

Bug reports welcomed if any of those bugs are still around.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Oct. 15, 2012, 4:38 a.m. UTC | #3
T24gU3VuLCAyMDEyLTEwLTE0IGF0IDIwOjQzIC0wNDAwLCBCcnVjZSBGaWVsZHMgd3JvdGU6DQo+
IE9uIFNhdCwgT2N0IDEzLCAyMDEyIGF0IDA2OjQyOjE1UE0gLTA3MDAsIExhcnJ5IE1jVm95IHdy
b3RlOg0KPiA+ID4gUFM6IHlvdSBhcmUgcHJlc3VtYWJseSBydW5uaW5nIE5GU3YyIG9uIHlvdXIg
TWFjcy4gT2RkIHRoYXQgdGhleSBzaG91bGQNCj4gPiA+IGRlZmF1bHQgdG8gdGhhdC4uLg0KPiA+
IA0KPiA+IEkgZHVubm8gaWYgdGhleSBkZWZhdWx0IHRvIHRoYXQgb3Igd2UgZm9yY2UgdGhhdC4g
IFdlIGhhdmUgaGFkIGxvdHMgb2YNCj4gPiBwcm9ibGVtcyB3aXRoIExpbnV4IE5GUywgd2UgZXhw
b3J0IG91ciBob21lIGRpcmVjdG9yaWVzIGJ1dCB0aGV5IGFyZQ0KPiA+IG1vcmUgb3IgbGVzcyBy
ZWFkIG9ubHkuDQo+ID4gDQo+ID4gTWF5YmUgdGhpbmdzIGhhdmUgZ290dGVuIGJldHRlciBidXQg
YmFjayBpbiB0aGUgZGF5IEkgY291bGQgY3Jhc2ggdGhlDQo+ID4ga2VybmVsIHdpdGggYSBiayBj
bG9uZSB0byBhIE5GUyBkaXJlY3RvcnkuDQo+IA0KPiBCdWcgcmVwb3J0cyB3ZWxjb21lZCBpZiBh
bnkgb2YgdGhvc2UgYnVncyBhcmUgc3RpbGwgYXJvdW5kLg0KDQpUaGUgb3RoZXIgdGhpbmcgdG8g
bm90ZSBpcyB0aGF0IGF0IHRoaXMgcG9pbnQsIE5GU3YyIGhhcyBiZWVuIGxlZ2FjeQ0KY29kZSBm
b3IgbW9yZSB0aGFuIDEwIHllYXJzIG9uIExpbnV4IGFuZCBpcyBzdGFydGluZyB0byBzdWZmZXIg
YmlnIHRpbWUNCmZyb20gYml0IHJvdC4gV2Ugc2hvdWxkIHByb2JhYmx5IGFpbSB0byByZW1vdmUg
aXQgZW50aXJlbHkgaW4gdGhlIG5leHQNCjEtMiB5ZWFycy4gVGhlcmUgaXMgbm8gcGxhY2UgaW4g
dG9kYXkncyB3b3JsZCBmb3IgYSBwcm90b2NvbCB0aGF0IGNhbg0Kb25seSBkZWFsIHdpdGggYSAy
R0IgbWF4aW11bSBmaWxlIHNpemUuLi4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5G
UyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29t
DQp3d3cubmV0YXBwLmNvbQ0K
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Oct. 15, 2012, 4:41 a.m. UTC | #4
On Sun, 2012-10-14 at 20:43 -0400, Bruce Fields wrote:
> Bug reports welcomed if any of those bugs are still around.


You did see the patch that I submitted, right? Since it is a server bug,
I'm assuming that you will want to push it upstream.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com
J. Bruce Fields Oct. 15, 2012, 12:11 p.m. UTC | #5
On Mon, Oct 15, 2012 at 04:41:56AM +0000, Myklebust, Trond wrote:
> On Sun, 2012-10-14 at 20:43 -0400, Bruce Fields wrote:
> > Bug reports welcomed if any of those bugs are still around.
> 
> You did see the patch that I submitted, right?

Yes.

> Since it is a server bug,
> I'm assuming that you will want to push it upstream.

Sure, I can take it.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Larry McVoy Oct. 15, 2012, 2:34 p.m. UTC | #6
On Mon, Oct 15, 2012 at 04:38:02AM +0000, Myklebust, Trond wrote:
> On Sun, 2012-10-14 at 20:43 -0400, Bruce Fields wrote:
> > On Sat, Oct 13, 2012 at 06:42:15PM -0700, Larry McVoy wrote:
> > > > PS: you are presumably running NFSv2 on your Macs. Odd that they should
> > > > default to that...
> > > 
> > > I dunno if they default to that or we force that.  We have had lots of
> > > problems with Linux NFS, we export our home directories but they are
> > > more or less read only.
> > > 
> > > Maybe things have gotten better but back in the day I could crash the
> > > kernel with a bk clone to a NFS directory.
> > 
> > Bug reports welcomed if any of those bugs are still around.
> 
> The other thing to note is that at this point, NFSv2 has been legacy
> code for more than 10 years on Linux and is starting to suffer big time
> from bit rot. We should probably aim to remove it entirely in the next
> 1-2 years. There is no place in today's world for a protocol that can
> only deal with a 2GB maximum file size...

Well, we support ancient machines (why?  damn good question) that only
speak NFSv2 so it would sorta suck for us if you pulled it.
J. Bruce Fields Oct. 15, 2012, 6:02 p.m. UTC | #7
On Mon, Oct 15, 2012 at 07:34:56AM -0700, Larry McVoy wrote:
> On Mon, Oct 15, 2012 at 04:38:02AM +0000, Myklebust, Trond wrote:
> > On Sun, 2012-10-14 at 20:43 -0400, Bruce Fields wrote:
> > > On Sat, Oct 13, 2012 at 06:42:15PM -0700, Larry McVoy wrote:
> > > > > PS: you are presumably running NFSv2 on your Macs. Odd that they should
> > > > > default to that...
> > > > 
> > > > I dunno if they default to that or we force that.  We have had lots of
> > > > problems with Linux NFS, we export our home directories but they are
> > > > more or less read only.
> > > > 
> > > > Maybe things have gotten better but back in the day I could crash the
> > > > kernel with a bk clone to a NFS directory.
> > > 
> > > Bug reports welcomed if any of those bugs are still around.
> > 
> > The other thing to note is that at this point, NFSv2 has been legacy
> > code for more than 10 years on Linux and is starting to suffer big time
> > from bit rot. We should probably aim to remove it entirely in the next
> > 1-2 years. There is no place in today's world for a protocol that can
> > only deal with a 2GB maximum file size...
> 
> Well, we support ancient machines (why?  damn good question) that only
> speak NFSv2 so it would sorta suck for us if you pulled it.

At least on the server side, I'd prefer to be more conservative about
removing NFSv2 support.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields Oct. 17, 2012, 2 p.m. UTC | #8
On Sat, Oct 13, 2012 at 04:42:05AM +0000, Myklebust, Trond wrote:
> On Fri, 2012-10-12 at 19:56 -0700, Larry McVoy wrote:
> > On Sat, Oct 13, 2012 at 02:52:51AM +0000, Myklebust, Trond wrote:
> > > On Fri, 2012-10-12 at 19:31 -0700, Larry McVoy wrote:
> > > > As I've said, debugging this is going to be hard, it means stopping my
> > > > company.  We can do one more crash, you guys can tell me what to do,
> > > > but that's about it.
> > > 
> > > Do you have a reproducer? 
> > 
> > Oh yeah, it happens every time.  Run skype on a mac w/ your home dir NFS
> > mounted.  Skype tries to do some lock operation and the machine panics.
> > 
> > We've done it 4 times in a row to be sure we weren't smoking crack.  No
> > crack.  Just panics.
> 
> OK. I think I can see what is happening. We need to filter the return
> values from nlm_lookup_file() when dealing with NLMv1 and NLMv3.
> 
> Can you please try the attached patch?

Larry, did you get a chance to test this patch?

It seems obviously right anyway, so I'll go ahead and pass it along
regardless, but it would be nice to have the confirmation.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From 1f0d4d1bb77fa0216dd2fb2041cc9b44db4697e0 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Sat, 13 Oct 2012 00:30:28 -0400
Subject: [PATCH] NLM: nlm_lookup_file() may return NLMv4-specific error codes

If the filehandle is stale, or open access is denied for some reason,
nlm_fopen() may return one of the NLMv4-specific error codes nlm4_stale_fh
or nlm4_failed. These get passed right through nlm_lookup_file(),
and so when nlmsvc_retrieve_args() calls the latter, it needs to filter
the result through the cast_status() machinery.

Failure to do so, will trigger the BUG_ON() in encode_nlm_stat...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/lockd/clntxdr.c | 2 +-
 fs/lockd/svcproc.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/lockd/clntxdr.c b/fs/lockd/clntxdr.c
index 180ac34..cf04cda 100644
--- a/fs/lockd/clntxdr.c
+++ b/fs/lockd/clntxdr.c
@@ -223,7 +223,7 @@  static void encode_nlm_stat(struct xdr_stream *xdr,
 {
 	__be32 *p;
 
-	BUG_ON(be32_to_cpu(stat) > NLM_LCK_DENIED_GRACE_PERIOD);
+	WARN_ON_ONCE(be32_to_cpu(stat) > NLM_LCK_DENIED_GRACE_PERIOD);
 	p = xdr_reserve_space(xdr, 4);
 	*p = stat;
 }
diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c
index d27aab1..d413af3 100644
--- a/fs/lockd/svcproc.c
+++ b/fs/lockd/svcproc.c
@@ -67,7 +67,8 @@  nlmsvc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp,
 
 	/* Obtain file pointer. Not used by FREE_ALL call. */
 	if (filp != NULL) {
-		if ((error = nlm_lookup_file(rqstp, &file, &lock->fh)) != 0)
+		error = cast_status(nlm_lookup_file(rqstp, &file, &lock->fh));
+		if (error != 0)
 			goto no_locks;
 		*filp = file;
 
-- 
1.7.11.7