diff mbox

Kerberized NFS-Server Problem still present in 3.10.0-rc2

Message ID 20130620145209.GC11728@fieldses.org (mailing list archive)
State New, archived
Headers show

Commit Message

J. Bruce Fields June 20, 2013, 2:52 p.m. UTC
On Thu, Jun 20, 2013 at 10:49:55AM -0400, J. Bruce Fields wrote:
> On Thu, Jun 20, 2013 at 10:03:55AM +0200, Sven Geggus wrote:
> > J. Bruce Fields schrieb am Mittwoch, den 19. Juni um 23:34 Uhr:
> > 
> > > Apologies, I don't remember the previous discussion, so, could you
> > > summarize for me?
> > 
> > Shure!
> > My original bug-report ist here:
> > http://www.spinics.net/lists/linux-nfs/msg37454.html
> > 
> > > - you're able to reproduce/not reproduce the problem by changing
> > >   *only* the kernel on the nfs server between 3.8.x and
> > >   3.10.0-rc2 ?
> > 
> > Exactly. See also the Postings of Richard van den Toorn on the list.
> > 
> > Summary: Mount is locking when upgrading from 3.8.x to 3.9.x
> > 
> > Unfortunately I was unable to do a git bisect because somewhere on the way
> > the behaviour changed from locking to "permission denied".
> > 
> > If you give me a hint if this behaviour should be marked as good or bad I
> > can continue bisecting!
> > 
> > > - have you figured out exactly where the failure happens?:
> > 
> > No because I have not been able to do git bisect to the end.
> > 
> > > - which version of NFS are you using?
> > 
> > NFS4 with Kerberos authentication.
> 
> What happens with NFSv3?
> 
> > > Also if you haven't already it would be useful to know at exactly which
> > > step of the process it's failing.  As a first step running wireshark on
> > > the traffic between client and server and looking for a NULL
> > > init_sec_context rpc call and seeing whether it succeeds or not, would
> > > be useful.
> > 
> > I already posted a wireshark dump:
> > http://www.spinics.net/lists/linux-nfs/msg37472.html
> 
> So it looks it did a null init_sec_context establishment and then the
> server didn't reply to the first rpc call using the new context.  But
> it's hard to be sure without more details--could I get the binary dump?
> 
> That might be explained by the problem fixed by
> 3c34ae11fac3b30629581d0bfaf80f58e82cfbfb "nfsd: fix krb5 handling of
> anonymous principals", but that was already in v3.9.

Hm, or I wonder if 8b5309d41751b8a086d8e7a43abe37d9ff24559d "svcrpc: fix
failures to handle -1 uid's and gid's" (appended) would help.

--b.

commit 8b5309d41751b8a086d8e7a43abe37d9ff24559d
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Fri May 24 17:24:34 2013 -0400

    svcrpc: fix failures to handle -1 uid's and gid's
    
    As of f025adf191924e3a75ce80e130afcd2485b53bb8 "sunrpc: Properly decode
    kuids and kgids in RPC_AUTH_UNIX credentials" any rpc containing a -1
    (0xffff) uid or gid would fail with a badcred error.
    
    Reported symptoms were xmbc clients failing on upgrade of the NFS
    server; examination of the network trace showed them sending -1 as the
    gid.
    
    Reported-by: Julian Sikorski <belegdol@gmail.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Sven Geggus June 21, 2013, 8:32 a.m. UTC | #1
J. Bruce Fields schrieb am Donnerstag, den 20. Juni um 16:52 Uhr:

> What happens with NFSv3?

I have no Idea. I have a NFS4-only Server Setup and would need to enable v3
explicitly.

> So it looks it did a null init_sec_context establishment and then the
> server didn't reply to the first rpc call using the new context.  But
> it's hard to be sure without more details--could I get the binary dump?

Attached (Server kernel was 3.10.0-rc2 with patch below).

> Hm, or I wonder if 8b5309d41751b8a086d8e7a43abe37d9ff24559d "svcrpc: fix
> failures to handle -1 uid's and gid's" (appended) would help.

No! I just patched this into a vanilla 3.10.0-rc2. Same Problem. The NFS-mount
just hangs.


Sven
J. Bruce Fields June 24, 2013, 9:54 p.m. UTC | #2
On Fri, Jun 21, 2013 at 10:32:22AM +0200, Sven Geggus wrote:
> J. Bruce Fields schrieb am Donnerstag, den 20. Juni um 16:52 Uhr:
> 
> > What happens with NFSv3?
> 
> I have no Idea. I have a NFS4-only Server Setup and would need to enable v3
> explicitly.
> 
> > So it looks it did a null init_sec_context establishment and then the
> > server didn't reply to the first rpc call using the new context.  But
> > it's hard to be sure without more details--could I get the binary dump?
> 
> Attached (Server kernel was 3.10.0-rc2 with patch below).

OK, yes, thanks, the init_sec_context exchange does look acceptable but
the server never responds to the first real rpc (a SETCLIENTID) using
the new context.

Could you run

	strace -p $(pidof rpc.svcgssd) -s4096 -e trace=open,close,read,write

on the server while trying this?  That should capture the contents of
the relevant upcalls.  I wonder if there's something odd about the
context svcgssd is passing down.

> > Hm, or I wonder if 8b5309d41751b8a086d8e7a43abe37d9ff24559d "svcrpc: fix
> > failures to handle -1 uid's and gid's" (appended) would help.
> 
> No! I just patched this into a vanilla 3.10.0-rc2. Same Problem. The NFS-mount
> just hangs.

OK, thanks for checking.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sven Geggus June 25, 2013, 9:46 a.m. UTC | #3
J. Bruce Fields schrieb am Montag, den 24. Juni um 23:54 Uhr:

> Could you run
> 
> 	strace -p $(pidof rpc.svcgssd) -s4096 -e trace=open,close,read,write
> 
> on the server while trying this?

Hm how confidential is this information? This is a test-vm only, but should
I re-generate the krb5.keytab of the machine when posting this here?

Anyway, file attached.

Regards

Sven
Sven Geggus July 1, 2013, 12:09 p.m. UTC | #4
Just an update on this. I tried to compile Linux 3.10 now and updated
my userland to nfs-utils 1.2.8 (from Debian unstable).

The Problem persists. NFS-Server stops working ith any vaniall Kernel newer
than 3.8.x including 3.10.

Sven
Sven Geggus July 2, 2013, 8:05 a.m. UTC | #5
J. Bruce Fields schrieb am Montag, den 01. Juli um 22:52 Uhr:

> It includes the contents of /etc/krb5.keytab, so I'm guessing so.

The server is a test-vm anyway.

> It'd be interesting to know exactly where nfsd decides to drop the
> rpc that uses this new context....  Turning on all the nfsd and rpc
> debugging *might* help there.  (rpcdebug -m rpc -s all; rpcdebug -m nfsd
> -s all).

 ~/ # rpcdebug -m rpc -s all
rpc        xprt call debug nfs auth bind sched trans svcsock svcdsp misc cache
 ~/ # rpcdebug -m nfsd -s all
nfsd       sock fh export svc proc fileop auth repcache xdr lockd

Here is what I get (Kernel is vanilla 3.10.0):

Jul  2 09:52:41 vnfsrv kernel: Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
Jul  2 09:52:41 vnfsrv kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Jul  2 09:52:41 vnfsrv kernel: NFSD: starting 90-second grace period (net ffffffff81556500)
Jul  2 09:54:11 vnfsrv kernel: NFSD: laundromat service - starting
Jul  2 09:54:11 vnfsrv kernel: NFSD: end of grace period
Jul  2 09:54:11 vnfsrv kernel: NFSD: laundromat_main - sleeping for 90 seconds
Jul  2 09:55:42 vnfsrv kernel: NFSD: laundromat service - starting
Jul  2 09:55:42 vnfsrv kernel: NFSD: laundromat_main - sleeping for 90 seconds
Jul  2 09:56:42 vnfsrv kernel: svc: socket ffff8800167aad40 TCP (listen) state change 10
Jul  2 09:56:42 vnfsrv kernel: svc: transport ffff8800159c0000 served by daemon ffff88001657a000
Jul  2 09:56:42 vnfsrv kernel: svc: socket ffff8800162147c0 TCP (listen) state change 1
Jul  2 09:56:42 vnfsrv kernel: svc: tcp_accept ffff8800159c0000 sock ffff880016a2e840
Jul  2 09:56:42 vnfsrv kernel: nfsd: connect from 10.1.7.30, port=740
Jul  2 09:56:42 vnfsrv kernel: svc: svc_setup_socket ffff880016a2ed40
Jul  2 09:56:42 vnfsrv kernel: setting up TCP socket for reading
Jul  2 09:56:42 vnfsrv kernel: svc: svc_setup_socket created ffff880015259000 (inet ffff8800162147c0)
Jul  2 09:56:42 vnfsrv kernel: svc: transport ffff880015259000 served by daemon ffff8800167cc000
Jul  2 09:56:42 vnfsrv kernel: svc: server ffff8800167cc000, pool 0, transport ffff880015259000, inuse=3
Jul  2 09:56:42 vnfsrv kernel: svc: tcp_recv ffff880015259000 data 1 conn 0 close 0
Jul  2 09:56:42 vnfsrv kernel: svc: socket ffff880015259000 recvfrom(ffff8800152592a4, 0) = 4
Jul  2 09:56:42 vnfsrv kernel: svc: TCP record, 40 bytes
Jul  2 09:56:42 vnfsrv kernel: svc: socket ffff880015259000 recvfrom(ffff88001607c028, 4056) = 40
Jul  2 09:56:42 vnfsrv kernel: svc: TCP final record (40 bytes)
Jul  2 09:56:42 vnfsrv kernel: svc: got len=40
Jul  2 09:56:42 vnfsrv kernel: svc: svc_authenticate (0)
Jul  2 09:56:42 vnfsrv kernel: svc: transport ffff8800159c0000 served by daemon ffff880016576000
Jul  2 09:56:42 vnfsrv kernel: svc: tcp_accept ffff8800159c0000 sock ffff880016a2e840
Jul  2 09:56:42 vnfsrv kernel: svc: server ffff880016576000 waiting for data (to = 900000)
Jul  2 09:56:42 vnfsrv kernel: svc: calling dispatcher
Jul  2 09:56:42 vnfsrv kernel: svc: server ffff88001657a000 waiting for data (to = 900000)
Jul  2 09:56:42 vnfsrv kernel: nfsd_dispatch: vers 4 proc 0
Jul  2 09:56:42 vnfsrv kernel: svc: socket ffff880015259000 sendto([ffff8800165d4000 28... ], 28) = 28 (addr 10.1.7.30, port=740)
Jul  2 09:56:42 vnfsrv kernel: svc: server ffff8800167cc000 waiting for data (to = 900000)
Jul  2 09:56:42 vnfsrv kernel: svc: socket ffff8800167aad40 TCP (listen) state change 10
Jul  2 09:56:42 vnfsrv kernel: svc: transport ffff8800159c0000 served by daemon ffff8800167cc000
Jul  2 09:56:42 vnfsrv kernel: svc: socket ffff880015277500 TCP (listen) state change 1
Jul  2 09:56:42 vnfsrv kernel: svc: tcp_accept ffff8800159c0000 sock ffff880016a2e840
Jul  2 09:56:42 vnfsrv kernel: 4nfsd: connect from unprivileged port: 10.1.7.30, port=55093
Jul  2 09:56:42 vnfsrv kernel: nfsd: connect from 10.1.7.30, port=55093
Jul  2 09:56:42 vnfsrv kernel: svc: svc_setup_socket ffff880016a10d00
Jul  2 09:56:43 vnfsrv kernel: setting up TCP socket for reading
Jul  2 09:56:43 vnfsrv kernel: svc: svc_setup_socket created ffff8800158ce000 (inet ffff880015277500)
Jul  2 09:56:43 vnfsrv kernel: svc: transport ffff8800158ce000 served by daemon ffff88001657a000
Jul  2 09:56:43 vnfsrv kernel: svc: server ffff88001657a000, pool 0, transport ffff8800158ce000, inuse=3
Jul  2 09:56:43 vnfsrv kernel: svc: tcp_recv ffff8800158ce000 data 1 conn 0 close 0
Jul  2 09:56:43 vnfsrv kernel: svc: socket ffff8800158ce000 recvfrom(ffff8800158ce2a4, 0) = 4
Jul  2 09:56:43 vnfsrv kernel: svc: TCP record, 1380 bytes
Jul  2 09:56:43 vnfsrv kernel: svc: socket ffff8800158ce000 recvfrom(ffff880016570564, 2716) = 1380
Jul  2 09:56:43 vnfsrv kernel: svc: TCP final record (1380 bytes)
Jul  2 09:56:43 vnfsrv kernel: svc: transport ffff8800159c0000 served by daemon ffff880016576000
Jul  2 09:56:43 vnfsrv kernel: svc: tcp_accept ffff8800159c0000 sock ffff880016a2e840
Jul  2 09:56:43 vnfsrv kernel: svc: server ffff880016576000 waiting for data (to = 900000)
Jul  2 09:56:43 vnfsrv kernel: svc: server ffff8800167cc000 waiting for data (to = 900000)
Jul  2 09:56:43 vnfsrv kernel: svc: got len=1380
Jul  2 09:56:43 vnfsrv kernel: svc: svc_authenticate (6)
Jul  2 09:56:43 vnfsrv kernel: RPC:       svcauth_gss: argv->iov_len = 1352
Jul  2 09:56:43 vnfsrv kernel: RPC:       Want update, refage=120, age=0
Jul  2 09:56:43 vnfsrv kernel: RPC:       krb5_encrypt returns 0
Jul  2 09:56:43 vnfsrv kernel: RPC:       krb5_encrypt returns 0
Jul  2 09:56:43 vnfsrv kernel: RPC:       krb5_encrypt returns 0
Jul  2 09:56:43 vnfsrv kernel: RPC:       krb5_encrypt returns 0
Jul  2 09:56:43 vnfsrv kernel: RPC:       krb5_encrypt returns 0
Jul  2 09:56:43 vnfsrv kernel: RPC:       krb5_encrypt returns 0
Jul  2 09:56:43 vnfsrv kernel: RPC:       krb5_encrypt returns 0
Jul  2 09:56:43 vnfsrv kernel: RPC:       krb5_encrypt returns 0
Jul  2 09:56:43 vnfsrv kernel: RPC:       krb5_encrypt returns 0
Jul  2 09:56:43 vnfsrv kernel: RPC:       krb5_encrypt returns 0
Jul  2 09:56:43 vnfsrv kernel: RPC:       krb5_encrypt returns 0
Jul  2 09:56:43 vnfsrv kernel: RPC:       krb5_encrypt returns 0
Jul  2 09:56:43 vnfsrv kernel: RPC:       gss_import_sec_context_kerberos: returning 0
Jul  2 09:56:43 vnfsrv kernel: RPC:       gss_get_mic_v2
Jul  2 09:56:43 vnfsrv kernel: svc: socket ffff8800158ce000 sendto([ffff880016571000 232... ], 232) = 232 (addr 10.1.7.30, port=55093)
Jul  2 09:56:43 vnfsrv kernel: svc: socket ffff880015277500 TCP data ready (svsk ffff8800158ce000)
Jul  2 09:56:43 vnfsrv kernel: svc: transport ffff8800158ce000 served by daemon ffff880016576000
Jul  2 09:56:43 vnfsrv kernel: svc: socket ffff880015277500 TCP (connected) state change 8 (svsk ffff8800158ce000)
Jul  2 09:56:43 vnfsrv kernel: svc: transport ffff8800158ce000 busy, not enqueued
Jul  2 09:56:43 vnfsrv kernel: svc: socket ffff880015277500 TCP data ready (svsk ffff8800158ce000)
Jul  2 09:56:43 vnfsrv kernel: svc: transport ffff8800158ce000 busy, not enqueued
Jul  2 09:56:43 vnfsrv kernel: svc: socket ffff8800162147c0 TCP data ready (svsk ffff880015259000)
Jul  2 09:56:43 vnfsrv kernel: svc: transport ffff880015259000 served by daemon ffff8800167cc000
Jul  2 09:56:43 vnfsrv kernel: svc_recv: found XPT_CLOSE
Jul  2 09:56:43 vnfsrv kernel: svc: svc_delete_xprt(ffff8800158ce000)
Jul  2 09:56:43 vnfsrv kernel: svc: svc_tcp_sock_detach(ffff8800158ce000)
Jul  2 09:56:43 vnfsrv kernel: svc: server ffff8800167cc000, pool 0, transport ffff880015259000, inuse=2
Jul  2 09:56:43 vnfsrv kernel: svc: tcp_recv ffff880015259000 data 1 conn 0 close 0
Jul  2 09:56:43 vnfsrv kernel: svc: socket ffff880015259000 recvfrom(ffff8800152592a4, 0) = 4
Jul  2 09:56:43 vnfsrv kernel: svc: svc_sock_detach(ffff8800158ce000)
Jul  2 09:56:43 vnfsrv kernel: svc: server ffff880016576000 waiting for data (to = 900000)
Jul  2 09:56:43 vnfsrv kernel: svc: transport ffff8800158ce000 busy, not enqueued
Jul  2 09:56:43 vnfsrv kernel: svc: svc_sock_free(ffff8800158ce000)
Jul  2 09:56:43 vnfsrv kernel: svc: server ffff88001657a000 waiting for data (to = 900000)
Jul  2 09:56:43 vnfsrv kernel: svc: TCP record, 200 bytes
Jul  2 09:56:43 vnfsrv kernel: svc: socket ffff880015259000 recvfrom(ffff88001607c0c8, 3896) = 200
Jul  2 09:56:43 vnfsrv kernel: svc: TCP final record (200 bytes)
Jul  2 09:56:43 vnfsrv kernel: svc: got len=200
Jul  2 09:56:43 vnfsrv kernel: svc: svc_authenticate (6)
Jul  2 09:56:43 vnfsrv kernel: RPC:       svcauth_gss: argv->iov_len = 172
Jul  2 09:56:43 vnfsrv kernel: RPC:       gss_verify_mic_v2
Jul  2 09:56:43 vnfsrv kernel: RPC:       gss_get_mic_v2
Jul  2 09:56:43 vnfsrv kernel: RPC:       Want update, refage=120, age=0
Jul  2 09:56:43 vnfsrv kernel: RPC:       Want update, refage=120, age=0
Jul  2 09:56:48 vnfsrv kernel: svc: svc_process dropit
Jul  2 09:56:48 vnfsrv kernel: svc: xprt ffff880015259000 dropped request
Jul  2 09:56:48 vnfsrv kernel: svc: server ffff8800167cc000 waiting for data (to = 900000)
Jul  2 09:57:12 vnfsrv kernel: NFSD: laundromat service - starting
Jul  2 09:57:12 vnfsrv kernel: NFSD: laundromat_main - sleeping for 90 seconds
Jul  2 09:57:42 vnfsrv kernel: RPC:       xs_close xprt ffff88001610b800
Jul  2 09:57:42 vnfsrv kernel: RPC:       disconnected transport ffff88001610b800
Jul  2 09:57:42 vnfsrv kernel: RPC:       wake_up_first(ffff88001610b978 "xprt_sending")
Jul  2 09:57:44 vnfsrv kernel: svc: socket ffff8800162147c0 TCP (connected) state change 8 (svsk ffff880015259000)
Jul  2 09:57:44 vnfsrv kernel: svc: transport ffff880015259000 served by daemon ffff8800167cc000
Jul  2 09:57:44 vnfsrv kernel: svc: socket ffff8800162147c0 TCP data ready (svsk ffff880015259000)
Jul  2 09:57:44 vnfsrv kernel: svc: transport ffff880015259000 busy, not enqueued
Jul  2 09:57:44 vnfsrv kernel: svc_recv: found XPT_CLOSE
Jul  2 09:57:44 vnfsrv kernel: svc: svc_delete_xprt(ffff880015259000)
Jul  2 09:57:44 vnfsrv kernel: svc: svc_tcp_sock_detach(ffff880015259000)
Jul  2 09:57:44 vnfsrv kernel: svc: svc_sock_detach(ffff880015259000)
Jul  2 09:57:44 vnfsrv kernel: svc: server ffff8800167cc000 waiting for data (to = 900000)
Jul  2 09:57:44 vnfsrv kernel: svc: socket ffff8800167aad40 TCP (listen) state change 10
Jul  2 09:57:44 vnfsrv kernel: svc: transport ffff8800159c0000 served by daemon ffff880016576000
Jul  2 09:57:44 vnfsrv kernel: svc: socket ffff8800164ed540 TCP (listen) state change 1
Jul  2 09:57:44 vnfsrv kernel: svc: tcp_accept ffff8800159c0000 sock ffff880016a2e840
Jul  2 09:57:44 vnfsrv kernel: nfsd: connect from 10.1.7.30, port=740
Jul  2 09:57:44 vnfsrv kernel: svc: svc_setup_socket ffff880016a2eac0
Jul  2 09:57:44 vnfsrv kernel: setting up TCP socket for reading
Jul  2 09:57:44 vnfsrv kernel: svc: svc_setup_socket created ffff88001669a000 (inet ffff8800164ed540)
Jul  2 09:57:44 vnfsrv kernel: svc: transport ffff88001669a000 served by daemon ffff8800167cc000
Jul  2 09:57:44 vnfsrv kernel: svc: server ffff8800167cc000, pool 0, transport ffff88001669a000, inuse=3
Jul  2 09:57:44 vnfsrv kernel: svc: tcp_recv ffff88001669a000 data 1 conn 0 close 0
Jul  2 09:57:44 vnfsrv kernel: svc: socket ffff88001669a000 recvfrom(ffff88001669a2a4, 0) = 4
Jul  2 09:57:44 vnfsrv kernel: svc: TCP record, 200 bytes
Jul  2 09:57:44 vnfsrv kernel: svc: socket ffff88001669a000 recvfrom(ffff88001607c0c8, 3896) = 200
Jul  2 09:57:44 vnfsrv kernel: svc: TCP final record (200 bytes)
Jul  2 09:57:44 vnfsrv kernel: svc: transport ffff8800159c0000 served by daemon ffff88001657a000
Jul  2 09:57:44 vnfsrv kernel: svc: tcp_accept ffff8800159c0000 sock ffff880016a2e840
Jul  2 09:57:44 vnfsrv kernel: svc: server ffff88001657a000 waiting for data (to = 900000)
Jul  2 09:57:44 vnfsrv kernel: svc: server ffff880016576000 waiting for data (to = 900000)
Jul  2 09:57:44 vnfsrv kernel: svc: got len=200
Jul  2 09:57:44 vnfsrv kernel: svc: svc_authenticate (6)
Jul  2 09:57:44 vnfsrv kernel: RPC:       svcauth_gss: argv->iov_len = 172
Jul  2 09:57:44 vnfsrv kernel: RPC:       gss_verify_mic_v2
Jul  2 09:57:44 vnfsrv kernel: RPC:       gss_get_mic_v2
Jul  2 09:57:44 vnfsrv kernel: RPC:       Want update, refage=120, age=61
Jul  2 09:57:49 vnfsrv kernel: svc: svc_process dropit
Jul  2 09:57:49 vnfsrv kernel: svc: xprt ffff88001669a000 dropped request
Jul  2 09:57:49 vnfsrv kernel: svc: server ffff8800167cc000 waiting for data (to = 900000)
Jul  2 09:58:42 vnfsrv kernel: NFSD: laundromat service - starting
Jul  2 09:58:42 vnfsrv kernel: NFSD: laundromat_main - sleeping for 90 seconds
Jul  2 09:59:45 vnfsrv kernel: svc: socket ffff8800164ed540 TCP (connected) state change 8 (svsk ffff88001669a000)
Jul  2 09:59:45 vnfsrv kernel: svc: transport ffff88001669a000 served by daemon ffff8800167cc000
Jul  2 09:59:45 vnfsrv kernel: svc: socket ffff8800164ed540 TCP data ready (svsk ffff88001669a000)
Jul  2 09:59:45 vnfsrv kernel: svc: transport ffff88001669a000 busy, not enqueued
Jul  2 09:59:45 vnfsrv kernel: svc_recv: found XPT_CLOSE
Jul  2 09:59:45 vnfsrv kernel: svc: svc_delete_xprt(ffff88001669a000)
Jul  2 09:59:45 vnfsrv kernel: svc: svc_tcp_sock_detach(ffff88001669a000)
Jul  2 09:59:45 vnfsrv kernel: svc: svc_sock_detach(ffff88001669a000)
Jul  2 09:59:45 vnfsrv kernel: svc: server ffff8800167cc000 waiting for data (to = 900000)
Jul  2 09:59:46 vnfsrv kernel: svc: socket ffff8800167aad40 TCP (listen) state change 10
Jul  2 09:59:46 vnfsrv kernel: svc: transport ffff8800159c0000 served by daemon ffff8800167cc000
Jul  2 09:59:46 vnfsrv kernel: svc: socket ffff88001503f440 TCP (listen) state change 1
Jul  2 09:59:46 vnfsrv kernel: svc: tcp_accept ffff8800159c0000 sock ffff880016a2e840
Jul  2 09:59:46 vnfsrv kernel: nfsd: connect from 10.1.7.30, port=740
Jul  2 09:59:46 vnfsrv kernel: svc: svc_setup_socket ffff880016accd40
Jul  2 09:59:46 vnfsrv kernel: setting up TCP socket for reading
Jul  2 09:59:46 vnfsrv kernel: svc: svc_setup_socket created ffff8800159f0000 (inet ffff88001503f440)
Jul  2 09:59:46 vnfsrv kernel: svc: transport ffff8800159f0000 served by daemon ffff880016576000
Jul  2 09:59:46 vnfsrv kernel: svc: server ffff880016576000, pool 0, transport ffff8800159f0000, inuse=3
Jul  2 09:59:46 vnfsrv kernel: svc: tcp_recv ffff8800159f0000 data 1 conn 0 close 0
Jul  2 09:59:46 vnfsrv kernel: svc: socket ffff8800159f0000 recvfrom(ffff8800159f02a4, 0) = 4
Jul  2 09:59:46 vnfsrv kernel: svc: TCP record, 200 bytes
Jul  2 09:59:46 vnfsrv kernel: svc: transport ffff8800159c0000 served by daemon ffff88001657a000
Jul  2 09:59:46 vnfsrv kernel: svc: tcp_accept ffff8800159c0000 sock ffff880016a2e840
Jul  2 09:59:46 vnfsrv kernel: svc: server ffff88001657a000 waiting for data (to = 900000)
Jul  2 09:59:46 vnfsrv kernel: svc: server ffff8800167cc000 waiting for data (to = 900000)
Jul  2 09:59:46 vnfsrv kernel: svc: socket ffff8800159f0000 recvfrom(ffff8800166f80c8, 3896) = 200
Jul  2 09:59:46 vnfsrv kernel: svc: TCP final record (200 bytes)
Jul  2 09:59:46 vnfsrv kernel: svc: got len=200
Jul  2 09:59:46 vnfsrv kernel: svc: svc_authenticate (6)
Jul  2 09:59:46 vnfsrv kernel: RPC:       svcauth_gss: argv->iov_len = 172
Jul  2 09:59:46 vnfsrv kernel: RPC:       gss_verify_mic_v2
Jul  2 09:59:46 vnfsrv kernel: RPC:       gss_get_mic_v2
Jul  2 09:59:46 vnfsrv kernel: RPC:       Want update, refage=120, age=0
Jul  2 09:59:51 vnfsrv kernel: svc: svc_process dropit
Jul  2 09:59:51 vnfsrv kernel: svc: xprt ffff8800159f0000 dropped request
Jul  2 09:59:51 vnfsrv kernel: svc: server ffff880016576000 waiting for data (to = 900000)
Jul  2 10:00:12 vnfsrv kernel: NFSD: laundromat service - starting
Jul  2 10:00:12 vnfsrv kernel: NFSD: laundromat_main - sleeping for 90 seconds
Jul  2 10:00:47 vnfsrv kernel: svc: socket ffff88001503f440 TCP (connected) state change 8 (svsk ffff8800159f0000)
Jul  2 10:00:47 vnfsrv kernel: svc: transport ffff8800159f0000 served by daemon ffff880016576000
Jul  2 10:00:47 vnfsrv kernel: svc: socket ffff88001503f440 TCP data ready (svsk ffff8800159f0000)
Jul  2 10:00:47 vnfsrv kernel: svc: transport ffff8800159f0000 busy, not enqueued
Jul  2 10:00:47 vnfsrv kernel: svc_recv: found XPT_CLOSE
Jul  2 10:00:47 vnfsrv kernel: svc: svc_delete_xprt(ffff8800159f0000)
Jul  2 10:00:47 vnfsrv kernel: svc: svc_tcp_sock_detach(ffff8800159f0000)
Jul  2 10:00:47 vnfsrv kernel: svc: svc_sock_detach(ffff8800159f0000)
Jul  2 10:00:47 vnfsrv kernel: svc: server ffff880016576000 waiting for data (to = 900000)
Jul  2 10:00:48 vnfsrv kernel: svc: socket ffff8800167aad40 TCP (listen) state change 10
Jul  2 10:00:48 vnfsrv kernel: svc: transport ffff8800159c0000 served by daemon ffff880016576000
Jul  2 10:00:48 vnfsrv kernel: svc: socket ffff880015a44d80 TCP (listen) state change 1
Jul  2 10:00:48 vnfsrv kernel: svc: tcp_accept ffff8800159c0000 sock ffff880016a2e840
Jul  2 10:00:48 vnfsrv kernel: nfsd: connect from 10.1.7.30, port=740
Jul  2 10:00:48 vnfsrv kernel: svc: svc_setup_socket ffff880016acc840
Jul  2 10:00:48 vnfsrv kernel: setting up TCP socket for reading
Jul  2 10:00:48 vnfsrv kernel: svc: svc_setup_socket created ffff8800152d7000 (inet ffff880015a44d80)
Jul  2 10:00:48 vnfsrv kernel: svc: transport ffff8800152d7000 served by daemon ffff8800167cc000
Jul  2 10:00:48 vnfsrv kernel: svc: server ffff8800167cc000, pool 0, transport ffff8800152d7000, inuse=3
Jul  2 10:00:48 vnfsrv kernel: svc: tcp_recv ffff8800152d7000 data 1 conn 0 close 0
Jul  2 10:00:48 vnfsrv kernel: svc: socket ffff8800152d7000 recvfrom(ffff8800152d72a4, 0) = 4
Jul  2 10:00:48 vnfsrv kernel: svc: TCP record, 200 bytes
Jul  2 10:00:48 vnfsrv kernel: svc: socket ffff8800152d7000 recvfrom(ffff88001607c0c8, 3896) = 200
Jul  2 10:00:48 vnfsrv kernel: svc: TCP final record (200 bytes)
Jul  2 10:00:48 vnfsrv kernel: svc: got len=200
Jul  2 10:00:48 vnfsrv kernel: svc: svc_authenticate (6)
Jul  2 10:00:48 vnfsrv kernel: svc: transport ffff8800159c0000 served by daemon ffff88001657a000
Jul  2 10:00:48 vnfsrv kernel: svc: tcp_accept ffff8800159c0000 sock ffff880016a2e840
Jul  2 10:00:48 vnfsrv kernel: svc: server ffff88001657a000 waiting for data (to = 900000)
Jul  2 10:00:48 vnfsrv kernel: svc: server ffff880016576000 waiting for data (to = 900000)
Jul  2 10:00:48 vnfsrv kernel: RPC:       svcauth_gss: argv->iov_len = 172
Jul  2 10:00:48 vnfsrv kernel: RPC:       gss_verify_mic_v2
Jul  2 10:00:48 vnfsrv kernel: RPC:       gss_get_mic_v2
Jul  2 10:00:48 vnfsrv kernel: RPC:       Want update, refage=120, age=62
Jul  2 10:00:53 vnfsrv kernel: svc: svc_process dropit
Jul  2 10:00:53 vnfsrv kernel: svc: xprt ffff8800152d7000 dropped request
Jul  2 10:00:53 vnfsrv kernel: svc: server ffff8800167cc000 waiting for data (to = 900000)
Jul  2 10:01:42 vnfsrv kernel: NFSD: laundromat service - starting
Jul  2 10:01:42 vnfsrv kernel: NFSD: laundromat_main - sleeping for 90 seconds
Jul  2 10:02:00 vnfsrv kernel: revisit canceled
Jul  2 10:02:00 vnfsrv kernel: svc: svc_sock_free(ffff8800159f0000)
Jul  2 10:02:00 vnfsrv kernel: revisit queued
Jul  2 10:02:00 vnfsrv kernel: svc: transport ffff8800152d7000 served by daemon ffff8800167cc000
Jul  2 10:02:00 vnfsrv kernel: svc: server ffff8800167cc000, pool 0, transport ffff8800152d7000, inuse=3
Jul  2 10:02:00 vnfsrv kernel: svc: got len=200
Jul  2 10:02:00 vnfsrv kernel: svc: transport ffff8800152d7000 served by daemon ffff880016576000
Jul  2 10:02:00 vnfsrv kernel: svc: server ffff880016576000, pool 0, transport ffff8800152d7000, inuse=5
Jul  2 10:02:00 vnfsrv kernel: svc: tcp_recv ffff8800152d7000 data 0 conn 0 close 0
Jul  2 10:02:00 vnfsrv kernel: svc: socket ffff8800152d7000 recvfrom(ffff8800152d72a0, 4) = -11
Jul  2 10:02:00 vnfsrv kernel: RPC: TCP recv_record got -11
Jul  2 10:02:00 vnfsrv kernel: svc: svc_authenticate (6)
Jul  2 10:02:00 vnfsrv kernel: RPC:       svcauth_gss: argv->iov_len = 172
Jul  2 10:02:00 vnfsrv kernel: RPC:       gss_get_mic_v2
Jul  2 10:02:00 vnfsrv kernel: RPC: TCP recvfrom got EAGAIN
Jul  2 10:02:00 vnfsrv kernel: svc: got len=0
Jul  2 10:02:00 vnfsrv kernel: svc: server ffff880016576000 waiting for data (to = 900000)
Jul  2 10:02:00 vnfsrv kernel: RPC:       Want update, refage=120, age=0
Jul  2 10:02:05 vnfsrv kernel: svc: svc_process dropit
Jul  2 10:02:05 vnfsrv kernel: svc: xprt ffff8800152d7000 dropped request
Jul  2 10:02:05 vnfsrv kernel: svc: server ffff8800167cc000 waiting for data (to = 900000)
Jul  2 10:02:23 vnfsrv kernel: svc: socket ffff880015a44d80 TCP (connected) state change 8 (svsk ffff8800152d7000)
Jul  2 10:02:23 vnfsrv kernel: svc: transport ffff8800152d7000 served by daemon ffff8800167cc000
Jul  2 10:02:23 vnfsrv kernel: svc: socket ffff880015a44d80 TCP data ready (svsk ffff8800152d7000)
Jul  2 10:02:23 vnfsrv kernel: svc: transport ffff8800152d7000 busy, not enqueued
Jul  2 10:02:23 vnfsrv kernel: svc_recv: found XPT_CLOSE
Jul  2 10:02:23 vnfsrv kernel: svc: svc_delete_xprt(ffff8800152d7000)
Jul  2 10:02:23 vnfsrv kernel: svc: svc_tcp_sock_detach(ffff8800152d7000)
Jul  2 10:02:23 vnfsrv kernel: svc: svc_sock_detach(ffff8800152d7000)
Jul  2 10:02:23 vnfsrv kernel: svc: server ffff8800167cc000 waiting for data (to = 900000)
Jul  2 10:02:43 vnfsrv kernel: svc_age_temp_xprts

Regards

Sven
Sven Geggus July 8, 2013, 8:08 a.m. UTC | #6
Monday Monday, so good to me :)

On Friday, 5. Jul 2013 22:34, J. Bruce Fields wrote:

> Hm, might be interesting to see what mountd's doing too:
> 
> 	strace -p $(pidof rpc.svcgssd) -s4096 -e trace=open,close,read,write

OK, output attached. While reading this output I recogniced, that I
overlooked the following error-messages from rpc.svcgssd and rpc.mountd:

Jul  8 09:53:38 vnfsrv rpc.svcgssd[2914]: ERROR: GSS-API: error in gss_free_lucid_sec_context(): GSS_S_NO_CONTEXT (No context has been established) - Unknown error
Jul  8 09:53:38 vnfsrv rpc.svcgssd[2914]: WARN: failed to free lucid sec context
Jul  8 09:53:38 vnfsrv rpc.mountd[2927]: qword_eol: fprintf failed: errno 22 (Invalid argument)

> Hm, maybe this gid upcall?  Does the following (probably not completely
> right) help?

Yes, we are getting there. While the strange messages from rpc.svcgssd
persist, the mount now seems to work again after your patch has been
applied.

Regards

Sven
J. Bruce Fields July 8, 2013, 1:35 p.m. UTC | #7
On Mon, Jul 08, 2013 at 10:08:05AM +0200, Sven Geggus wrote:
> Monday Monday, so good to me :)
> 
> On Friday, 5. Jul 2013 22:34, J. Bruce Fields wrote:
> 
> > Hm, might be interesting to see what mountd's doing too:
> > 
> > 	strace -p $(pidof rpc.svcgssd) -s4096 -e trace=open,close,read,write

Argh--I said "mountd", then I forgot to replace the rpc.svcgssd in the
cut-n-pasted commandline; should have been:

	strace -p $(pidof rpc.mountd) -s4096 -e trace=open,close,read,write

> 
> OK, output attached. While reading this output I recogniced, that I
> overlooked the following error-messages from rpc.svcgssd and rpc.mountd:
> 
> Jul  8 09:53:38 vnfsrv rpc.svcgssd[2914]: ERROR: GSS-API: error in gss_free_lucid_sec_context(): GSS_S_NO_CONTEXT (No context has been established) - Unknown error
> Jul  8 09:53:38 vnfsrv rpc.svcgssd[2914]: WARN: failed to free lucid sec context
> Jul  8 09:53:38 vnfsrv rpc.mountd[2927]: qword_eol: fprintf failed: errno 22 (Invalid argument)
> 
> > Hm, maybe this gid upcall?  Does the following (probably not completely
> > right) help?
> 
> Yes, we are getting there. While the strange messages from rpc.svcgssd
> persist, the mount now seems to work again after your patch has been
> applied.

Anyway.  OK, that more or less solves the mystery, though the strace
output might still be interesting.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sven Geggus July 8, 2013, 3:54 p.m. UTC | #8
J. Bruce Fields schrieb am Montag, den 08. Juli um 15:35 Uhr:

> Anyway.  OK, that more or less solves the mystery, though the strace
> output might still be interesting.

Not very enlightening, at least for me, but here we go:

read(3, "nfsd 10.1.7.30\n", 2048)       = 15
close(13)                               = 0
open("/var/lib/nfs/etab", O_RDONLY)     = 13
close(13)                               = 0
close(13)                               = 0
write(3, "nfsd 10.1.7.30 1373300439 * \n", 29) = 29
read(4, "4294967295\n", 2048)           = 11
close(14)                               = 0
close(13)                               = 0
read(13, "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0", 36) = 36
close(13)                               = 0
write(4, "4294967295 1373300439 0 \n", 25) = -1 EINVAL (Invalid argument)

Regards

Sven
J. Bruce Fields July 9, 2013, 4:57 p.m. UTC | #9
On Mon, Jul 08, 2013 at 10:08:05AM +0200, Sven Geggus wrote:
> OK, output attached. While reading this output I recogniced, that I
> overlooked the following error-messages from rpc.svcgssd and rpc.mountd:

I don't think they're causing you any practical problem, but there was
some discussion of how to destroy contexts in gssd recently that I
didn't follow closely enough.  I seem to recall Neil and Chuck being the
ones chasing down those bugs?

> Jul  8 09:53:38 vnfsrv rpc.svcgssd[2914]: ERROR: GSS-API: error in gss_free_lucid_sec_context(): GSS_S_NO_CONTEXT (No context has been established) - Unknown error
> Jul  8 09:53:38 vnfsrv rpc.svcgssd[2914]: WARN: failed to free lucid sec context

Does rpc.svcgssd still need fixing?  (Sven, could you remind us what
nfs-utils version this is from?)

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever July 9, 2013, 5:23 p.m. UTC | #10
On Jul 9, 2013, at 12:57 PM, "J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Mon, Jul 08, 2013 at 10:08:05AM +0200, Sven Geggus wrote:
>> OK, output attached. While reading this output I recogniced, that I
>> overlooked the following error-messages from rpc.svcgssd and rpc.mountd:
> 
> I don't think they're causing you any practical problem, but there was
> some discussion of how to destroy contexts in gssd recently that I
> didn't follow closely enough.  I seem to recall Neil and Chuck being the
> ones chasing down those bugs?

I haven't touched rpc.svcgssd.  I thought this code was being replaced by the gss_proxy-based implementation.

>> Jul  8 09:53:38 vnfsrv rpc.svcgssd[2914]: ERROR: GSS-API: error in gss_free_lucid_sec_context(): GSS_S_NO_CONTEXT (No context has been established) - Unknown error
>> Jul  8 09:53:38 vnfsrv rpc.svcgssd[2914]: WARN: failed to free lucid sec context
> 
> Does rpc.svcgssd still need fixing?  (Sven, could you remind us what
> nfs-utils version this is from?)
> 
> --b.
Sven Geggus July 10, 2013, 7:55 a.m. UTC | #11
J. Bruce Fields <bfields@fieldses.org> wrote:

>> Jul  8 09:53:38 vnfsrv rpc.svcgssd[2914]: ERROR: GSS-API: error in gss_free_lucid_sec_context(): GSS_S_NO_CONTEXT (No context has been established) - Unknown error
>> Jul  8 09:53:38 vnfsrv rpc.svcgssd[2914]: WARN: failed to free lucid sec context
> 
> Does rpc.svcgssd still need fixing?  (Sven, could you remind us what
> nfs-utils version this is from?)

1.2.8

Regards

Sven
diff mbox

Patch

diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index c3f9e1e..06bdf5a 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -810,11 +810,15 @@  svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp)
 		goto badcred;
 	argv->iov_base = (void*)((__be32*)argv->iov_base + slen);	/* skip machname */
 	argv->iov_len -= slen*4;
-
+	/*
+	 * Note: we skip uid_valid()/gid_valid() checks here for
+	 * backwards compatibility with clients that use -1 id's.
+	 * Instead, -1 uid or gid is later mapped to the
+	 * (export-specific) anonymous id by nfsd_setuser.
+	 * Supplementary gid's will be left alone.
+	 */
 	cred->cr_uid = make_kuid(&init_user_ns, svc_getnl(argv)); /* uid */
 	cred->cr_gid = make_kgid(&init_user_ns, svc_getnl(argv)); /* gid */
-	if (!uid_valid(cred->cr_uid) || !gid_valid(cred->cr_gid))
-		goto badcred;
 	slen = svc_getnl(argv);			/* gids length */
 	if (slen > 16 || (len -= (slen + 2)*4) < 0)
 		goto badcred;
@@ -823,8 +827,6 @@  svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp)
 		return SVC_CLOSE;
 	for (i = 0; i < slen; i++) {
 		kgid_t kgid = make_kgid(&init_user_ns, svc_getnl(argv));
-		if (!gid_valid(kgid))
-			goto badcred;
 		GROUP_AT(cred->cr_group_info, i) = kgid;
 	}
 	if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) {