Message ID | k7jj0c$nak$1@ultimate100.geggus.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Nov 09, 2012 at 06:45:32PM +0000, Sven Geggus wrote: > Sven Geggus <lists@fuchsschwanzdomain.de> wrote: > > > OK, I now figured out which commit did cause the problem: > > > > Thus "git diff 08843b79..cc8362b1" on a linux-stable tree from > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git should > > give us the relevant changes. > > After a private conversation with J. Bruce Fields I figured out that > I have not been quite there yet. So here comes a FTR post what > exactly caused my problem. Thanks for tracking this down--not what I would have guessed! Given that the trace showed a problem starting around context creation time, I'm most suspicious of the callers in rsc_parse, which are mostly parsing uid's. Is it possible that your system has very large uid's? (Large enough that they'd look like negative numbers when cast to ints?) Output from strace -p $(pidof rpc.mountd) -s4096 -e trace=open,close,read,write (while reproducing the bug) might help confirm that. --b. > > It is the following change: > > $ git diff d9c2ede63c74048dfddbb129c59ac01176b0ab71 bbf43dc888833ac0539e437dbaeb28bfd4fbab9f > diff --git a/include/linux/sunrpc/cache.h > b/include/linux/sunrpc/cache.h > index 6def1f6..af42596 100644 > --- a/include/linux/sunrpc/cache.h > +++ b/include/linux/sunrpc/cache.h > @@ -217,8 +217,6 @@ extern int qword_get(char **bpp, char *dest, int > bufsize); > static inline int get_int(char **bpp, int *anint) > { > char buf[50]; > - char *ep; > - int rv; > int len = qword_get(bpp, buf, sizeof(buf)); > > if (len < 0) > @@ -226,11 +224,9 @@ static inline int get_int(char **bpp, int > *anint) > if (len == 0) > return -ENOENT; > > - rv = simple_strtol(buf, &ep, 0); > - if (*ep) > + if (kstrtoint(buf, 0, anint)) > return -EINVAL; > > - *anint = rv; > return 0; > } > > Reverting this change on recent kernels makes them work for me again. > > Sven > > > -- > Unix is simple and coherent, but it takes a genius – or at any rate a > programmer – to understand and appreciate the simplicity > (Dennis M. Ritchie) > /me is giggls@ircnet, http://sven.gegg.us/ on the Web > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
In article <20121109200730.GI6171@fieldses.org> you wrote: > Is it possible that your system has very large uid's? (Large enough > that they'd look like negative numbers when cast to ints?) Shurely not. however, mounting is done as root which might get mapped to nobody which usually is 65534. > Output from > > strace -p $(pidof rpc.mountd) -s4096 -e trace=open,close,read,write > > (while reproducing the bug) might help confirm that. While doing the hanging or while doing the proper mount? Sven
On Fri, Nov 09, 2012 at 11:45:41PM +0100, Sven Geggus wrote: > In article <20121109200730.GI6171@fieldses.org> you wrote: > > > Is it possible that your system has very large uid's? (Large enough > > that they'd look like negative numbers when cast to ints?) > > Shurely not. however, mounting is done as root which might get mapped to > nobody which usually is 65534. > > > Output from > > > > strace -p $(pidof rpc.mountd) -s4096 -e trace=open,close,read,write > > > > (while reproducing the bug) might help confirm that. > > While doing the hanging or while doing the proper mount? Restart the server, start strace, then try the mount, let it hang a few seconds just to make sure you got anything interesting, then kill strace and send the output. I guess the results in the succesful (good kernel) case might be interesting too, but probably the bad case is enough. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
J. Bruce Fields schrieb am Samstag, den 10. November um 00:24 Uhr: OK, back at work and here is what I get: > Restart the server, start strace, then try the mount, let it hang a few > seconds just to make sure you got anything interesting, then kill strace > and send the output. OK, back at work and here is what I get... read(3, "nfsd 10.1.7.30\n", 2048) = 15 close(15) = 0 open("/var/lib/nfs/etab", O_RDONLY) = 15 close(15) = 0 close(15) = 0 write(3, "nfsd 10.1.7.30 1352710828 * \n", 29) = 29 read(4, "4294967295\n", 2048) = 11 close(16) = 0 close(15) = 0 read(15, "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0", 36) = 36 close(15) = 0 write(4, "4294967295 1352710828 0 \n", 25) = -1 EINVAL (Invalid argument) 4294967295 is UINT_MAX and this place is where it behaves differently on a good kernel where the write call will succeed: write(4, "4294967295 1352710828 0 \n", 25) = 25 Sven P.S.: Your patched svcauth_gss.c will give me an "access denied by server" while mounting instead of the infinite delay: ~/ # mount -t nfs4 -o sec=krb5 testsrv:/storage /mnt/ mount.nfs4: access denied by server while mounting testsrv:/storage
diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h index 6def1f6..af42596 100644 --- a/include/linux/sunrpc/cache.h +++ b/include/linux/sunrpc/cache.h @@ -217,8 +217,6 @@ extern int qword_get(char **bpp, char *dest, int bufsize); static inline int get_int(char **bpp, int *anint) { char buf[50]; - char *ep; - int rv; int len = qword_get(bpp, buf, sizeof(buf));