Message ID | 1466520807-4340-1-git-send-email-steved@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> On Jun 21, 2016, at 10:53 AM, Steve Dickson <steved@redhat.com> wrote: > > When Kerberos is enabled, the /etc/krb5.keytab exists > which causes the both gssd daemons to start, automatically. > > With rpc.gssd running, on all NFS mounts, an upcall > is done to get GSS security context for SETCLIENTID procedure. > > When Kerberos is not configured for NFS, meaning > there is no host/hostname@REALM principal in > the key tab, those upcalls always fall causing > the mount to hang for several seconds. What is the root cause of the temporary hang? When you say "the upcall fails" do you mean there is no reply, or that there is a negative reply after a delay, or there is an immediate negative reply? > This patch added an [Install] section to both > services so the services can be enable and disable. > The README was also updated. > > Signed-off-by: Steve Dickson <steved@redhat.com> > --- > systemd/README | 14 +++++--------- > systemd/rpc-gssd.service | 6 ++++++ > systemd/rpc-svcgssd.service | 7 +++++++ > 3 files changed, 18 insertions(+), 9 deletions(-) > > diff --git a/systemd/README b/systemd/README > index 7c43df8..58dae42 100644 > --- a/systemd/README > +++ b/systemd/README > @@ -59,13 +59,9 @@ information such as in /etc/sysconfig/nfs or /etc/defaults/nfs. > It is run once by nfs-config.service. > > rpc.gssd and rpc.svcgssd are assumed to be needed if /etc/krb5.keytab > -is present. > -If a site needs this file present but does not want the gss daemons > -running, it should create > - /etc/systemd/system/rpc-gssd.service.d/01-disable.conf > -and > - /etc/systemd/system/rpc-svcgssd.service.d/01-disable.conf > +is present. If a site needs this file present but does not want > +the gss daemons running, they can be disabled by doing > + > + systemctl disable rpc-gssd > + systemctl disable rpc-svcgssd > > -containing > - [Unit] > - ConditionNull=false > diff --git a/systemd/rpc-gssd.service b/systemd/rpc-gssd.service > index d4a3819..681f26a 100644 > --- a/systemd/rpc-gssd.service > +++ b/systemd/rpc-gssd.service > @@ -17,3 +17,9 @@ EnvironmentFile=-/run/sysconfig/nfs-utils > > Type=forking > ExecStart=/usr/sbin/rpc.gssd $GSSDARGS > + > +# Only start if the service is enabled > +# and /etc/krb5.keytab exists > +[Install] > +WantedBy=multi-user.target > + > diff --git a/systemd/rpc-svcgssd.service b/systemd/rpc-svcgssd.service > index 41177b6..4433ed7 100644 > --- a/systemd/rpc-svcgssd.service > +++ b/systemd/rpc-svcgssd.service > @@ -18,3 +18,10 @@ After=nfs-config.service > EnvironmentFile=-/run/sysconfig/nfs-utils > Type=forking > ExecStart=/usr/sbin/rpc.svcgssd $SVCGSSDARGS > + > +# Only start if the service is enabled > +# and /etc/krb5.keytab exists > +# and when gss-proxy is not runing > +[Install] > +WantedBy=multi-user.target > + > -- > 2.5.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/21/2016 11:26 AM, Chuck Lever wrote: > >> On Jun 21, 2016, at 10:53 AM, Steve Dickson <steved@redhat.com> wrote: >> >> When Kerberos is enabled, the /etc/krb5.keytab exists >> which causes the both gssd daemons to start, automatically. >> >> With rpc.gssd running, on all NFS mounts, an upcall >> is done to get GSS security context for SETCLIENTID procedure. >> >> When Kerberos is not configured for NFS, meaning >> there is no host/hostname@REALM principal in >> the key tab, those upcalls always fall causing >> the mount to hang for several seconds. > > What is the root cause of the temporary hang? All the upcalls to rpc.gssd... I think there are three for every mount. > > When you say "the upcall fails" do you mean there is > no reply, or that there is a negative reply after a > delay, or there is an immediate negative reply? Good point.. the upcalls did not fail, they just received negative replies. steved. > > >> This patch added an [Install] section to both >> services so the services can be enable and disable. >> The README was also updated. >> >> Signed-off-by: Steve Dickson <steved@redhat.com> >> --- >> systemd/README | 14 +++++--------- >> systemd/rpc-gssd.service | 6 ++++++ >> systemd/rpc-svcgssd.service | 7 +++++++ >> 3 files changed, 18 insertions(+), 9 deletions(-) >> >> diff --git a/systemd/README b/systemd/README >> index 7c43df8..58dae42 100644 >> --- a/systemd/README >> +++ b/systemd/README >> @@ -59,13 +59,9 @@ information such as in /etc/sysconfig/nfs or /etc/defaults/nfs. >> It is run once by nfs-config.service. >> >> rpc.gssd and rpc.svcgssd are assumed to be needed if /etc/krb5.keytab >> -is present. >> -If a site needs this file present but does not want the gss daemons >> -running, it should create >> - /etc/systemd/system/rpc-gssd.service.d/01-disable.conf >> -and >> - /etc/systemd/system/rpc-svcgssd.service.d/01-disable.conf >> +is present. If a site needs this file present but does not want >> +the gss daemons running, they can be disabled by doing >> + >> + systemctl disable rpc-gssd >> + systemctl disable rpc-svcgssd >> >> -containing >> - [Unit] >> - ConditionNull=false >> diff --git a/systemd/rpc-gssd.service b/systemd/rpc-gssd.service >> index d4a3819..681f26a 100644 >> --- a/systemd/rpc-gssd.service >> +++ b/systemd/rpc-gssd.service >> @@ -17,3 +17,9 @@ EnvironmentFile=-/run/sysconfig/nfs-utils >> >> Type=forking >> ExecStart=/usr/sbin/rpc.gssd $GSSDARGS >> + >> +# Only start if the service is enabled >> +# and /etc/krb5.keytab exists >> +[Install] >> +WantedBy=multi-user.target >> + >> diff --git a/systemd/rpc-svcgssd.service b/systemd/rpc-svcgssd.service >> index 41177b6..4433ed7 100644 >> --- a/systemd/rpc-svcgssd.service >> +++ b/systemd/rpc-svcgssd.service >> @@ -18,3 +18,10 @@ After=nfs-config.service >> EnvironmentFile=-/run/sysconfig/nfs-utils >> Type=forking >> ExecStart=/usr/sbin/rpc.svcgssd $SVCGSSDARGS >> + >> +# Only start if the service is enabled >> +# and /etc/krb5.keytab exists >> +# and when gss-proxy is not runing >> +[Install] >> +WantedBy=multi-user.target >> + >> -- >> 2.5.5 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Chuck Lever > > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Jun 21, 2016, at 11:43 AM, Steve Dickson <SteveD@redhat.com> wrote: > > > > On 06/21/2016 11:26 AM, Chuck Lever wrote: >> >>> On Jun 21, 2016, at 10:53 AM, Steve Dickson <steved@redhat.com> wrote: >>> >>> When Kerberos is enabled, the /etc/krb5.keytab exists >>> which causes the both gssd daemons to start, automatically. >>> >>> With rpc.gssd running, on all NFS mounts, an upcall >>> is done to get GSS security context for SETCLIENTID procedure. >>> >>> When Kerberos is not configured for NFS, meaning >>> there is no host/hostname@REALM principal in >>> the key tab, those upcalls always fall causing >>> the mount to hang for several seconds. >> >> What is the root cause of the temporary hang? > All the upcalls to rpc.gssd... I think there are > three for every mount. > >> >> When you say "the upcall fails" do you mean there is >> no reply, or that there is a negative reply after a >> delay, or there is an immediate negative reply? > Good point.. the upcalls did not fail, they > just received negative replies. I would say that the upcalls themselves are not the root cause of the delay if they all return immediately. Are you saying that each negative reply takes a moment? If that's the case, is there something that gssd should do to reply more quickly when there's no host or nfs service principal in the keytab? Adding administrative interface complexity to work around an underlying implementation problem might not be the best long term choice. > steved. > >> >> >>> This patch added an [Install] section to both >>> services so the services can be enable and disable. >>> The README was also updated. >>> >>> Signed-off-by: Steve Dickson <steved@redhat.com> >>> --- >>> systemd/README | 14 +++++--------- >>> systemd/rpc-gssd.service | 6 ++++++ >>> systemd/rpc-svcgssd.service | 7 +++++++ >>> 3 files changed, 18 insertions(+), 9 deletions(-) >>> >>> diff --git a/systemd/README b/systemd/README >>> index 7c43df8..58dae42 100644 >>> --- a/systemd/README >>> +++ b/systemd/README >>> @@ -59,13 +59,9 @@ information such as in /etc/sysconfig/nfs or /etc/defaults/nfs. >>> It is run once by nfs-config.service. >>> >>> rpc.gssd and rpc.svcgssd are assumed to be needed if /etc/krb5.keytab >>> -is present. >>> -If a site needs this file present but does not want the gss daemons >>> -running, it should create >>> - /etc/systemd/system/rpc-gssd.service.d/01-disable.conf >>> -and >>> - /etc/systemd/system/rpc-svcgssd.service.d/01-disable.conf >>> +is present. If a site needs this file present but does not want >>> +the gss daemons running, they can be disabled by doing >>> + >>> + systemctl disable rpc-gssd >>> + systemctl disable rpc-svcgssd >>> >>> -containing >>> - [Unit] >>> - ConditionNull=false >>> diff --git a/systemd/rpc-gssd.service b/systemd/rpc-gssd.service >>> index d4a3819..681f26a 100644 >>> --- a/systemd/rpc-gssd.service >>> +++ b/systemd/rpc-gssd.service >>> @@ -17,3 +17,9 @@ EnvironmentFile=-/run/sysconfig/nfs-utils >>> >>> Type=forking >>> ExecStart=/usr/sbin/rpc.gssd $GSSDARGS >>> + >>> +# Only start if the service is enabled >>> +# and /etc/krb5.keytab exists >>> +[Install] >>> +WantedBy=multi-user.target >>> + >>> diff --git a/systemd/rpc-svcgssd.service b/systemd/rpc-svcgssd.service >>> index 41177b6..4433ed7 100644 >>> --- a/systemd/rpc-svcgssd.service >>> +++ b/systemd/rpc-svcgssd.service >>> @@ -18,3 +18,10 @@ After=nfs-config.service >>> EnvironmentFile=-/run/sysconfig/nfs-utils >>> Type=forking >>> ExecStart=/usr/sbin/rpc.svcgssd $SVCGSSDARGS >>> + >>> +# Only start if the service is enabled >>> +# and /etc/krb5.keytab exists >>> +# and when gss-proxy is not runing >>> +[Install] >>> +WantedBy=multi-user.target >>> + >>> -- >>> 2.5.5 >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> Chuck Lever -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hey, On 06/21/2016 11:47 AM, Chuck Lever wrote: >>> >> When you say "the upcall fails" do you mean there is >>> >> no reply, or that there is a negative reply after a >>> >> delay, or there is an immediate negative reply? >> > Good point.. the upcalls did not fail, they >> > just received negative replies. > I would say that the upcalls themselves are not the > root cause of the delay if they all return immediately. Well when rpc.gssd is not running (aka no upcalls) the delays stop happening. > > Are you saying that each negative reply takes a moment? Yes. Even on sec=sys mounts. Which is the issue. > If that's the case, is there something that gssd should > do to reply more quickly when there's no host or nfs > service principal in the keytab? I don't think so... unless we start caching negative negative response or something like which is way overkill especially since the problem is solved by not starting rpc.gssd. > > Adding administrative interface complexity to work around > an underlying implementation problem might not be the best > long term choice. Well there already was way to stop gssd from starting when kerberos is configured but not for NFS. From the systemd/README: rpc.gssd and rpc.svcgssd are assumed to be needed if /etc/krb5.keytab is present. If a site needs this file present but does not want the gss daemons running, it should create /etc/systemd/system/rpc-gssd.service.d/01-disable.conf and /etc/systemd/system/rpc-svcgssd.service.d/01-disable.conf containing [Unit] ConditionNull=false Which does work and will still work... but I'm thinking it is much similar to disable the service via systemd command systemctl disable rpc-gssd than creating and editing those .conf files. steved -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Jun 21, 2016, at 1:20 PM, Steve Dickson <SteveD@redhat.com> wrote: > > Hey, > > On 06/21/2016 11:47 AM, Chuck Lever wrote: >>>>>> When you say "the upcall fails" do you mean there is >>>>>> no reply, or that there is a negative reply after a >>>>>> delay, or there is an immediate negative reply? >>>> Good point.. the upcalls did not fail, they >>>> just received negative replies. >> I would say that the upcalls themselves are not the >> root cause of the delay if they all return immediately. > Well when rpc.gssd is not running (aka no upcalls) > the delays stop happening. Well let me say it a different way: the mechanism of performing an upcall should be fast. The stuff that gssd is doing as a result of the upcall request may be taking longer than expected, though. If gssd is up, and has nothing to do (which I think is the case here?) then IMO that upcall should be unnoticeable. I don't expect there to be any difference between the kernel squelching an upcall, and an upcall completing immediately. >> Are you saying that each negative reply takes a moment? > Yes. Even on sec=sys mounts. Which is the issue. Yep, I get that. I've seen that behavior on occasion, and agree it should be addressed somehow. >> If that's the case, is there something that gssd should >> do to reply more quickly when there's no host or nfs >> service principal in the keytab? > I don't think so... unless we start caching negative > negative response or something like which is way > overkill especially since the problem is solved > by not starting rpc.gssd. I'd like to understand why this upcall, which should be equivalent to a no-op, is not returning an immediate answer. Three of these in a row shouldn't take more than a dozen milliseconds. How long does the upcall take when there is a service principal versus how long it takes when there isn't one? Try running gssd under strace to get some timings. Is gssd waiting for syslog or something? >> Adding administrative interface complexity to work around >> an underlying implementation problem might not be the best >> long term choice. > Well there already was way to stop gssd from starting when > kerberos is configured but not for NFS. From the systemd/README: > > rpc.gssd and rpc.svcgssd are assumed to be needed if /etc/krb5.keytab > is present. > If a site needs this file present but does not want the gss daemons > running, it should create > /etc/systemd/system/rpc-gssd.service.d/01-disable.conf > and > /etc/systemd/system/rpc-svcgssd.service.d/01-disable.conf > > containing > [Unit] > ConditionNull=false > > Which does work and will still work... but I'm thinking it is > much similar to disable the service via systemd command > systemctl disable rpc-gssd > > than creating and editing those .conf files. This should all be automatic, IMO. On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5 to your mounts. No reboot, nothing to restart. Linux should be that simple. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sorry for the delayed response... PTO yesterday. On 06/21/2016 01:57 PM, Chuck Lever wrote: > >> On Jun 21, 2016, at 1:20 PM, Steve Dickson <SteveD@redhat.com> wrote: >> >> Hey, >> >> On 06/21/2016 11:47 AM, Chuck Lever wrote: >>>>>>> When you say "the upcall fails" do you mean there is >>>>>>> no reply, or that there is a negative reply after a >>>>>>> delay, or there is an immediate negative reply? >>>>> Good point.. the upcalls did not fail, they >>>>> just received negative replies. >>> I would say that the upcalls themselves are not the >>> root cause of the delay if they all return immediately. >> Well when rpc.gssd is not running (aka no upcalls) >> the delays stop happening. > > Well let me say it a different way: the mechanism of > performing an upcall should be fast. The stuff that gssd > is doing as a result of the upcall request may be taking > longer than expected, though. I'm pretty sure its not the actual mechanism causing the delay... Its the act of failing (read keytabs maybe even ping the KDC) is what taking the time at least that's what the sys logs show. > > If gssd is up, and has nothing to do (which I think is > the case here?) then IMO that upcall should be unnoticeable. Well its not... It is causing a delay. > I don't expect there to be any difference between the kernel > squelching an upcall, and an upcall completing immediately. There kernel will always make the upcall when rpc.gssd is running... I don't see how the kernel can squelch the upcall with rpc.gssd running. Not starting rpc.gssd is the only way to squelch the upcall. > > >>> Are you saying that each negative reply takes a moment? >> Yes. Even on sec=sys mounts. Which is the issue. > > Yep, I get that. I've seen that behavior on occasion, > and agree it should be addressed somehow. > > >>> If that's the case, is there something that gssd should >>> do to reply more quickly when there's no host or nfs >>> service principal in the keytab? >> I don't think so... unless we start caching negative >> negative response or something like which is way >> overkill especially since the problem is solved >> by not starting rpc.gssd. > > I'd like to understand why this upcall, which should be > equivalent to a no-op, is not returning an immediate > answer. Three of these in a row shouldn't take more than > a dozen milliseconds. It looks like, from the systlog timestamps, each upcall is taking a ~1 sec. > > How long does the upcall take when there is a service > principal versus how long it takes when there isn't one? > Try running gssd under strace to get some timings. the key tab does have a nfs/hosname@REALM entry. So the call to the KDC is probably failing... which could be construed as a misconfiguration, but that misconfiguration should not even come into play with sec=sys mounts... IMHO... > > Is gssd waiting for syslog or something? No... its just failing to get the machine creds for root [snip] >> Which does work and will still work... but I'm thinking it is >> much similar to disable the service via systemd command >> systemctl disable rpc-gssd >> >> than creating and editing those .conf files. > > This should all be automatic, IMO. > > On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5 > to your mounts. No reboot, nothing to restart. Linux should be > that simple. The only extra step with Linux is to 'sysctmctl start rpc-gssd' I don't there is much would can do about that.... But of course... Patches are always welcomed!! 8-) TBL... When kerberos is configured correctly for NFS everything works just fine. When kerberos is configured, but not for NFS, causes delays on all NFS mounts. Today, there is a method to stop rpc-gssd from blindly starting when kerberos is configured to eliminate that delay. This patch just tweaking that method to make things easier. To address your concern about covering up a bug. I just don't see it... The code is doing exactly what its asked to do. By default the kernel asks krb5i context (when rpc.gssd is run). rpc.gssd looking for a principle in the key tab, when found the KDC is called... Everything is working just like it should and it is failing just like it should. I'm just trying to eliminate all this process when not needed, in an easier way.. steved. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Jun 23, 2016, at 11:57 AM, Steve Dickson <SteveD@redhat.com> wrote: > > Sorry for the delayed response... PTO yesterday. > >> On 06/21/2016 01:57 PM, Chuck Lever wrote: >> >>> On Jun 21, 2016, at 1:20 PM, Steve Dickson <SteveD@redhat.com> wrote: >>> >>> Hey, >>> >>> On 06/21/2016 11:47 AM, Chuck Lever wrote: >>>>>>>> When you say "the upcall fails" do you mean there is >>>>>>>> no reply, or that there is a negative reply after a >>>>>>>> delay, or there is an immediate negative reply? >>>>>> Good point.. the upcalls did not fail, they >>>>>> just received negative replies. >>>> I would say that the upcalls themselves are not the >>>> root cause of the delay if they all return immediately. >>> Well when rpc.gssd is not running (aka no upcalls) >>> the delays stop happening. >> >> Well let me say it a different way: the mechanism of >> performing an upcall should be fast. The stuff that gssd >> is doing as a result of the upcall request may be taking >> longer than expected, though. > I'm pretty sure its not the actual mechanism causing the > delay... Its the act of failing (read keytabs maybe even > ping the KDC) is what taking the time at least that's > what the sys logs show. > >> >> If gssd is up, and has nothing to do (which I think is >> the case here?) then IMO that upcall should be unnoticeable. > Well its not... It is causing a delay. > >> I don't expect there to be any difference between the kernel >> squelching an upcall, and an upcall completing immediately. > There kernel will always make the upcall when rpc.gssd > is running... I don't see how the kernel can squelch the upcall > with rpc.gssd running. Not starting rpc.gssd is the only > way to squelch the upcall. > >> >> >>>> Are you saying that each negative reply takes a moment? >>> Yes. Even on sec=sys mounts. Which is the issue. >> >> Yep, I get that. I've seen that behavior on occasion, >> and agree it should be addressed somehow. >> >> >>>> If that's the case, is there something that gssd should >>>> do to reply more quickly when there's no host or nfs >>>> service principal in the keytab? >>> I don't think so... unless we start caching negative >>> negative response or something like which is way >>> overkill especially since the problem is solved >>> by not starting rpc.gssd. >> >> I'd like to understand why this upcall, which should be >> equivalent to a no-op, is not returning an immediate >> answer. Three of these in a row shouldn't take more than >> a dozen milliseconds. > It looks like, from the systlog timestamps, each upcall > is taking a ~1 sec. > >> >> How long does the upcall take when there is a service >> principal versus how long it takes when there isn't one? >> Try running gssd under strace to get some timings. > the key tab does have a nfs/hosname@REALM entry. So the > call to the KDC is probably failing... which > could be construed as a misconfiguration, but > that misconfiguration should not even come into > play with sec=sys mounts... IMHO... I disagree, of course. sec=sys means the client is not going to use Kerberos to authenticate individual user requests, and users don't need a Kerberos ticket to access their files. That's still the case. I'm not aware of any promise that sec=sys means there is no Kerberos within 50 miles of that mount. If there are valid keytabs on both systems, they need to be set up correctly. If there's a misconfiguration, then gssd needs to report it precisely instead of time out. And it's just as easy to add a service principal to a keytab as it is to disable a systemd service in that case. >> Is gssd waiting for syslog or something? > No... its just failing to get the machine creds for root Clearly more is going on than that, and so far we have only some speculation. Can you provide an strace of rpc.gssd or a network capture so we can confirm what's going on? > [snip] > >>> Which does work and will still work... but I'm thinking it is >>> much similar to disable the service via systemd command >>> systemctl disable rpc-gssd >>> >>> than creating and editing those .conf files. >> >> This should all be automatic, IMO. >> >> On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5 >> to your mounts. No reboot, nothing to restart. Linux should be >> that simple. > The only extra step with Linux is to 'sysctmctl start rpc-gssd' > I don't there is much would can do about that.... Sure there is. Leave gssd running, and make sure it can respond quickly in every reasonable case. :-p > But of > course... Patches are always welcomed!! 8-) > > TBL... When kerberos is configured correctly for NFS everything > works just fine. When kerberos is configured, but not for NFS, > causes delays on all NFS mounts. This convinces me even more that there is a gssd issue here. > Today, there is a method to stop rpc-gssd from blindly starting > when kerberos is configured to eliminate that delay. I can fix my broken TV by not turning it on, and I don't notice the problem. But the problem is still there any time I want to watch TV. The problem is not fixed by disabling gssd, it's just hidden in some cases. > This patch just tweaking that method to make things easier. It makes one thing easier, and other things more difficult. As a community, I thought our goal was to make Kerberos easier to use, not easier to turn off. > To address your concern about covering up a bug. I just don't > see it... The code is doing exactly what its asked to do. > By default the kernel asks krb5i context (when rpc.gssd > is run). rpc.gssd looking for a principle in the key tab, > when found the KDC is called... > > Everything is working just like it should and it is > failing just like it should. I'm just trying to > eliminate all this process when not needed, in > an easier way.. I'm not even sure now what the use case is. The client has proper principals, but the server doesn't? The server should refuse the init sec context immediately. Is gssd even running on the server? Suppose there are a thousand clients and one broken server. An administrator would fix that one server by adding an extra service principal, rather than log into a thousand clients to change a setting on each. Suppose your client wants both sys and krb5 mounts of a group of servers, and some are "misconfigured." You have to enable gssd on the client but there are still delays on the sec=sys mounts. In fact, I think that's going to be pretty common. Why add an NFS service principal on a client if you don't expect to use sec=krb5 some of the time? -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Again, sorry for the delay... That darn flux capacitor broke... again!!! :-) On 06/23/2016 09:30 PM, Chuck Lever wrote: > >> On Jun 23, 2016, at 11:57 AM, Steve Dickson <SteveD@redhat.com> wrote: [snip] >> the key tab does have a nfs/hosname@REALM entry. So the >> call to the KDC is probably failing... which >> could be construed as a misconfiguration, but >> that misconfiguration should not even come into >> play with sec=sys mounts... IMHO... > > I disagree, of course. sec=sys means the client is not going > to use Kerberos to authenticate individual user requests, > and users don't need a Kerberos ticket to access their files. > That's still the case. > > I'm not aware of any promise that sec=sys means there is > no Kerberos within 50 miles of that mount. I think that's is the assumption... No Kerberos will be needed for sec=sys mounts. Its not when Kerberos is not configured. > > If there are valid keytabs on both systems, they need to > be set up correctly. If there's a misconfiguration, then > gssd needs to report it precisely instead of time out. > And it's just as easy to add a service principal to a keytab > as it is to disable a systemd service in that case. I think its more straightforward to disable a service that is not needed than to have to add a principal to a keytab for a service that's not being used or needed. > > >>> Is gssd waiting for syslog or something? >> No... its just failing to get the machine creds for root > > Clearly more is going on than that, and so far we have only > some speculation. Can you provide an strace of rpc.gssd or > a network capture so we can confirm what's going on? Yes... Yes... and Yes.. I added you to the bz... > > >> [snip] >> >>>> Which does work and will still work... but I'm thinking it is >>>> much similar to disable the service via systemd command >>>> systemctl disable rpc-gssd >>>> >>>> than creating and editing those .conf files. >>> >>> This should all be automatic, IMO. >>> >>> On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5 >>> to your mounts. No reboot, nothing to restart. Linux should be >>> that simple. >> The only extra step with Linux is to 'sysctmctl start rpc-gssd' >> I don't there is much would can do about that.... > > Sure there is. Leave gssd running, and make sure it can respond > quickly in every reasonable case. :-p > > >> But of >> course... Patches are always welcomed!! 8-) >> >> TBL... When kerberos is configured correctly for NFS everything >> works just fine. When kerberos is configured, but not for NFS, >> causes delays on all NFS mounts. > > This convinces me even more that there is a gssd issue here. > > >> Today, there is a method to stop rpc-gssd from blindly starting >> when kerberos is configured to eliminate that delay. > > I can fix my broken TV by not turning it on, and I don't > notice the problem. But the problem is still there any > time I want to watch TV. > > The problem is not fixed by disabling gssd, it's just > hidden in some cases. I agree this %100... All I'm saying there should be a way to disable it when the daemon is not needed or used. Having it automatically started just because there is a keytab, at first, I thought was a good idea, now it turns not people really don't what miscellaneous daemons running. Case in point gssproxy... Automatically comes but there is a way to disable it. With rpc.gssd there is not (easily). > > >> This patch just tweaking that method to make things easier. > > It makes one thing easier, and other things more difficult. > As a community, I thought our goal was to make Kerberos > easier to use, not easier to turn off. Again I can't agree with you more! But this is the case were Kerberos is *not* being used for NFS... we should make that case work as well... > > >> To address your concern about covering up a bug. I just don't >> see it... The code is doing exactly what its asked to do. >> By default the kernel asks krb5i context (when rpc.gssd >> is run). rpc.gssd looking for a principle in the key tab, >> when found the KDC is called... >> >> Everything is working just like it should and it is >> failing just like it should. I'm just trying to >> eliminate all this process when not needed, in >> an easier way.. > > I'm not even sure now what the use case is. The client has > proper principals, but the server doesn't? The server > should refuse the init sec context immediately. Is gssd > even running on the server? No they don't because they are not using Kerberos for NFS... So I guess this is what we are saying: If you what to used Kerberos for anything at all, they must configure it for NFS for their clients to work properly... I'm not sure we really want to say this. > > Suppose there are a thousand clients and one broken > server. An administrator would fix that one server by > adding an extra service principal, rather than log > into a thousand clients to change a setting on each. > > Suppose your client wants both sys and krb5 mounts of > a group of servers, and some are "misconfigured." > You have to enable gssd on the client but there are still > delays on the sec=sys mounts. In both these cases you are assuming Kerberos mounts are being used and so Kerberos should be configured for NFS. That is just not the case. > > In fact, I think that's going to be pretty common. Why add > an NFS service principal on a client if you don't expect > to use sec=krb5 some of the time? In that case adding the principal does make sense. But... Why *must* you add a principal when you know only sec=sys mounts will be used? steved. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Jun 28, 2016, at 10:27 AM, Steve Dickson <SteveD@redhat.com> wrote: > > Again, sorry for the delay... That darn flux capacitor broke... again!!! :-) > > On 06/23/2016 09:30 PM, Chuck Lever wrote: >> >>> On Jun 23, 2016, at 11:57 AM, Steve Dickson <SteveD@redhat.com> wrote: > > [snip] > >>> the key tab does have a nfs/hosname@REALM entry. So the >>> call to the KDC is probably failing... which >>> could be construed as a misconfiguration, but >>> that misconfiguration should not even come into >>> play with sec=sys mounts... IMHO... >> >> I disagree, of course. sec=sys means the client is not going >> to use Kerberos to authenticate individual user requests, >> and users don't need a Kerberos ticket to access their files. >> That's still the case. >> >> I'm not aware of any promise that sec=sys means there is >> no Kerberos within 50 miles of that mount. > I think that's is the assumption... No Kerberos will be > needed for sec=sys mounts. Its not when Kerberos is > not configured. NFSv3 sec=sys happens to mean that no Kerberos is needed. This hasn't changed either. NFSv4 sec=sys is different. Just like NFSv4 ACLs, and NFSv4 ID mapping, and NFSv4 locking, and so on. Note though that Kerberos isn't needed for NFSv4 sec=sys even when there is a keytab. The client negotiates and operates without it. >> If there are valid keytabs on both systems, they need to >> be set up correctly. If there's a misconfiguration, then >> gssd needs to report it precisely instead of time out. >> And it's just as easy to add a service principal to a keytab >> as it is to disable a systemd service in that case. > I think its more straightforward to disable a service > that is not needed than to have to add a principal to a > keytab for a service that's not being used or needed. IMO automating NFS setup so that it chooses the most secure possible settings without intervention is the best possible solution. >>>> Is gssd waiting for syslog or something? >>> No... its just failing to get the machine creds for root >> >> Clearly more is going on than that, and so far we have only >> some speculation. Can you provide an strace of rpc.gssd or >> a network capture so we can confirm what's going on? > Yes... Yes... and Yes.. I added you to the bz... Thanks! I'll have a look at it. >>> [snip] >>> >>>>> Which does work and will still work... but I'm thinking it is >>>>> much similar to disable the service via systemd command >>>>> systemctl disable rpc-gssd >>>>> >>>>> than creating and editing those .conf files. >>>> >>>> This should all be automatic, IMO. >>>> >>>> On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5 >>>> to your mounts. No reboot, nothing to restart. Linux should be >>>> that simple. >>> The only extra step with Linux is to 'sysctmctl start rpc-gssd' >>> I don't there is much would can do about that.... >> >> Sure there is. Leave gssd running, and make sure it can respond >> quickly in every reasonable case. :-p >> >> >>> But of >>> course... Patches are always welcomed!! 8-) >>> >>> TBL... When kerberos is configured correctly for NFS everything >>> works just fine. When kerberos is configured, but not for NFS, >>> causes delays on all NFS mounts. >> >> This convinces me even more that there is a gssd issue here. >> >> >>> Today, there is a method to stop rpc-gssd from blindly starting >>> when kerberos is configured to eliminate that delay. >> >> I can fix my broken TV by not turning it on, and I don't >> notice the problem. But the problem is still there any >> time I want to watch TV. >> >> The problem is not fixed by disabling gssd, it's just >> hidden in some cases. > I agree this %100... All I'm saying there should be a > way to disable it when the daemon is not needed or used. NFSv4 sec=sys *does* use Kerberos, when it is available. It has for years. Documentation should be updated to state that if Kerberos is configured on clients, they will attempt to use it to manage some operations that are common to all NFSv4 mount points on that client, even when a mount point uses sec=sys. Kerberos will be used for user authentication only if the client administrator has not specified a sec= setting, but the server export allows the use of Kerberos; or if the client administrator has specified a sec=krb5, sec=krb5i, or sec=krb5p setting. The reason for using Kerberos for common operations is that a client may have just one lease management principal. If the client uses sec=sys and sec=krb5 mounts, and the sec=sys mount is done first, then lease management would use sys as well. The client cannot change this principal after it has established a lease and files are open. A subsequent sec=krb5 mount will also use sec=sys for lease management. This will be surprising and insecure behavior. Therefore, all mounts from this client attempt to set up a krb5 lease management transport. The server should have an nfs/ service principal. It doesn't _require_ one, but it's a best practice to have one in place. Administrators that have Kerberos available should use it. There's no overhead to enabling it on NFS servers, as long as the list of security flavors the server returns for each export does not include Kerberos flavors. > Having it automatically started just because there is a > keytab, at first, I thought was a good idea, now > it turns not people really don't what miscellaneous > daemons running. Case in point gssproxy... Automatically > comes but there is a way to disable it. With rpc.gssd > there is not (easily). There are good reasons to disable daemons: - The daemon consumes a lot of resources. - The daemon exposes an attack surface. gssd does neither. There are good reasons not to disable daemons: - It enables simpler administration. - It keeps the test matrix narrow (because you have to test just one configuration, not multiple ones: gssd enabled, gssd disabled, and so on). Always enabling gssd provides both of these benefits. >>> This patch just tweaking that method to make things easier. >> >> It makes one thing easier, and other things more difficult. >> As a community, I thought our goal was to make Kerberos >> easier to use, not easier to turn off. > Again I can't agree with you more! But this is the case > were Kerberos is *not* being used for NFS... we should > make that case work as well... Agreed. But NFSv4 sec=sys *does* use Kerberos when Kerberos is configured on the system. It's a fact, and we now need to make it convenient and natural and bug-free. The choice is between increasing security and just making it work, or adding one more knob that administrators have to Google for. >>> To address your concern about covering up a bug. I just don't >>> see it... The code is doing exactly what its asked to do. >>> By default the kernel asks krb5i context (when rpc.gssd >>> is run). rpc.gssd looking for a principle in the key tab, >>> when found the KDC is called... >>> >>> Everything is working just like it should and it is >>> failing just like it should. I'm just trying to >>> eliminate all this process when not needed, in >>> an easier way.. >> >> I'm not even sure now what the use case is. The client has >> proper principals, but the server doesn't? The server >> should refuse the init sec context immediately. Is gssd >> even running on the server? > No they don't because they are not using Kerberos for NFS... OK, let's state clearly what's going on here: The client has a host/ principal. gssd is started automatically. The server has what? If the server has a keytab and an nfs/ principal, gss-proxy should be running, and there are no delays. If the server has a keytab and no nfs/ principal, gss-proxy should be running, and any init sec context should fail immediately. There should be no delay. (If there is a delay, that needs to be troubleshot). If the server does not have a keytab, gss-proxy will not be running, and NFSv4 clients will have to sense this. It takes a moment for each sniff. Otherwise, there's no operational difference. I'm assuming then that the problem is that Kerberos is not set up on the _server_. Can you confirm this? Also, this negotiation should be done only during the first contact of each server after a client reboot, thus the delay happens only during the first mount, not during subsequent ones. Can that also be confirmed? > So I guess this is what we are saying: > > If you what to used Kerberos for anything at all, > they must configure it for NFS for their clients > to work properly... I'm not sure we really want to > say this. Well, the clients are working properly without the server principal in place. They just have an extra delay at mount time. (you yourself pointed out in an earlier e-mail that the client is doing everything correctly, and no mention has been made of any other operational issue). We should encourage customers to set up in the most secure way possible. In this case: - Kerberos is already available in the environment - It's not _required_ only _recommended_ (clients can still use sec=sys without it) for the server to enable Kerberos, but it's a best practice I'm guessing that if gssd and gss-proxy are running on the server all the time, even when there is no keytab, that delay should go away for everyone. So: - Always run a gssd service on servers that export NFSv4 (I assume this will address the delay problem) - Recommend the NFS server be provisioned with an nfs/ principal, and explicitly specify sec=sys on exports to prevent clients from negotiating an unwanted Kerberos security setting I far prefer these fixes to adding another administrative setting on the client. It encourages better security, and it addresses the problem for all NFS clients that might want to try using Kerberos against Linux NFS servers, for whatever reason. >> Suppose there are a thousand clients and one broken >> server. An administrator would fix that one server by >> adding an extra service principal, rather than log >> into a thousand clients to change a setting on each. >> >> Suppose your client wants both sys and krb5 mounts of >> a group of servers, and some are "misconfigured." >> You have to enable gssd on the client but there are still >> delays on the sec=sys mounts. > In both these cases you are assuming Kerberos mounts > are being used and so Kerberos should be configured > for NFS. That is just not the case. My assumption is that administrators would prefer automatic client set up, and good security by default. There's no way to know in advance whether an administrator will want sec=sys and sec=krb5 mounts on the same system. /etc/fstab can be changed at any time, mounts can be done by hand, or the administrator can add or remove principals from /etc/krb5.keytab. Our clients have to work when there are just sec=sys mounts, or when there are sec=sys and sec=krb5 mounts. They must allow on-demand configuration of sec=krb5. They must attempt to provide the best possible level of security at all times. The out-of-the-shrinkwrap configuration must assume a mix of capabilities. >> In fact, I think that's going to be pretty common. Why add >> an NFS service principal on a client if you don't expect >> to use sec=krb5 some of the time? > In that case adding the principal does make sense. But... > > Why *must* you add a principal when you know only sec=sys > mounts will be used? Explained in detail above (and this is only for NFSv4, and is not at all a _must_). But in summary: A client will attempt to use Kerberos for NFSv4 sec=sys when there is a host/ or nfs/ principal in its keytab. That needs to be documented. Our _recommendation_ is that the server be provisioned with an nfs/ principal as well when NFSv4 is used in an environment where Kerberos is present. This eliminates a costly per-mount security negotiation, and enables cryptographically strong authentication of each client that mounts that server. NFSv4 sec=sys works properly otherwise without this principal. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Jun 28, 2016, at 12:27 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > >> >> On Jun 28, 2016, at 10:27 AM, Steve Dickson <SteveD@redhat.com> wrote: >> >> Again, sorry for the delay... That darn flux capacitor broke... again!!! :-) >> >> On 06/23/2016 09:30 PM, Chuck Lever wrote: >>> >>>> On Jun 23, 2016, at 11:57 AM, Steve Dickson <SteveD@redhat.com> wrote: >> >> [snip] >> >>>> the key tab does have a nfs/hosname@REALM entry. So the >>>> call to the KDC is probably failing... which >>>> could be construed as a misconfiguration, but >>>> that misconfiguration should not even come into >>>> play with sec=sys mounts... IMHO... >>> >>> I disagree, of course. sec=sys means the client is not going >>> to use Kerberos to authenticate individual user requests, >>> and users don't need a Kerberos ticket to access their files. >>> That's still the case. >>> >>> I'm not aware of any promise that sec=sys means there is >>> no Kerberos within 50 miles of that mount. >> I think that's is the assumption... No Kerberos will be >> needed for sec=sys mounts. Its not when Kerberos is >> not configured. > > NFSv3 sec=sys happens to mean that no Kerberos is needed. > This hasn't changed either. > > NFSv4 sec=sys is different. Just like NFSv4 ACLs, and > NFSv4 ID mapping, and NFSv4 locking, and so on. > > Note though that Kerberos isn't needed for NFSv4 sec=sys > even when there is a keytab. The client negotiates and > operates without it. > > >>> If there are valid keytabs on both systems, they need to >>> be set up correctly. If there's a misconfiguration, then >>> gssd needs to report it precisely instead of time out. >>> And it's just as easy to add a service principal to a keytab >>> as it is to disable a systemd service in that case. >> I think its more straightforward to disable a service >> that is not needed than to have to add a principal to a >> keytab for a service that's not being used or needed. > > IMO automating NFS setup so that it chooses the most > secure possible settings without intervention is the > best possible solution. > > >>>>> Is gssd waiting for syslog or something? >>>> No... its just failing to get the machine creds for root >>> >>> Clearly more is going on than that, and so far we have only >>> some speculation. Can you provide an strace of rpc.gssd or >>> a network capture so we can confirm what's going on? >> Yes... Yes... and Yes.. I added you to the bz... > > Thanks! I'll have a look at it. > > >>>> [snip] >>>> >>>>>> Which does work and will still work... but I'm thinking it is >>>>>> much similar to disable the service via systemd command >>>>>> systemctl disable rpc-gssd >>>>>> >>>>>> than creating and editing those .conf files. >>>>> >>>>> This should all be automatic, IMO. >>>>> >>>>> On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5 >>>>> to your mounts. No reboot, nothing to restart. Linux should be >>>>> that simple. >>>> The only extra step with Linux is to 'sysctmctl start rpc-gssd' >>>> I don't there is much would can do about that.... >>> >>> Sure there is. Leave gssd running, and make sure it can respond >>> quickly in every reasonable case. :-p >>> >>> >>>> But of >>>> course... Patches are always welcomed!! 8-) >>>> >>>> TBL... When kerberos is configured correctly for NFS everything >>>> works just fine. When kerberos is configured, but not for NFS, >>>> causes delays on all NFS mounts. >>> >>> This convinces me even more that there is a gssd issue here. >>> >>> >>>> Today, there is a method to stop rpc-gssd from blindly starting >>>> when kerberos is configured to eliminate that delay. >>> >>> I can fix my broken TV by not turning it on, and I don't >>> notice the problem. But the problem is still there any >>> time I want to watch TV. >>> >>> The problem is not fixed by disabling gssd, it's just >>> hidden in some cases. >> I agree this %100... All I'm saying there should be a >> way to disable it when the daemon is not needed or used. > > NFSv4 sec=sys *does* use Kerberos, when it is available. > It has for years. > > Documentation should be updated to state that if Kerberos > is configured on clients, they will attempt to use it to > manage some operations that are common to all NFSv4 mount > points on that client, even when a mount point uses sec=sys. > > Kerberos will be used for user authentication only if the > client administrator has not specified a sec= setting, but > the server export allows the use of Kerberos; or if the > client administrator has specified a sec=krb5, sec=krb5i, > or sec=krb5p setting. > > The reason for using Kerberos for common operations is > that a client may have just one lease management principal. > If the client uses sec=sys and sec=krb5 mounts, and the > sec=sys mount is done first, then lease management would use > sys as well. The client cannot change this principal after > it has established a lease and files are open. > > A subsequent sec=krb5 mount will also use sec=sys for > lease management. This will be surprising and insecure > behavior. Therefore, all mounts from this client attempt > to set up a krb5 lease management transport. Chuck, Thanks for explaining this so well! This definitely should make it’s way into documentation - we should have added something like this a long time ago. I’m definitely guilty of having to figure out why the client worked this way and not documenting it... -dros > > The server should have an nfs/ service principal. It > doesn't _require_ one, but it's a best practice to have > one in place. > > Administrators that have Kerberos available should use > it. There's no overhead to enabling it on NFS servers, > as long as the list of security flavors the server > returns for each export does not include Kerberos > flavors. > > >> Having it automatically started just because there is a >> keytab, at first, I thought was a good idea, now >> it turns not people really don't what miscellaneous >> daemons running. Case in point gssproxy... Automatically >> comes but there is a way to disable it. With rpc.gssd >> there is not (easily). > > There are good reasons to disable daemons: > > - The daemon consumes a lot of resources. > - The daemon exposes an attack surface. > > gssd does neither. > > There are good reasons not to disable daemons: > > - It enables simpler administration. > - It keeps the test matrix narrow (because you > have to test just one configuration, not > multiple ones: gssd enabled, gssd disabled, > and so on). > > Always enabling gssd provides both of these benefits. > > >>>> This patch just tweaking that method to make things easier. >>> >>> It makes one thing easier, and other things more difficult. >>> As a community, I thought our goal was to make Kerberos >>> easier to use, not easier to turn off. >> Again I can't agree with you more! But this is the case >> were Kerberos is *not* being used for NFS... we should >> make that case work as well... > > Agreed. > > But NFSv4 sec=sys *does* use Kerberos when Kerberos is > configured on the system. It's a fact, and we now need to > make it convenient and natural and bug-free. The choice is > between increasing security and just making it work, or > adding one more knob that administrators have to Google for. > > >>>> To address your concern about covering up a bug. I just don't >>>> see it... The code is doing exactly what its asked to do. >>>> By default the kernel asks krb5i context (when rpc.gssd >>>> is run). rpc.gssd looking for a principle in the key tab, >>>> when found the KDC is called... >>>> >>>> Everything is working just like it should and it is >>>> failing just like it should. I'm just trying to >>>> eliminate all this process when not needed, in >>>> an easier way.. >>> >>> I'm not even sure now what the use case is. The client has >>> proper principals, but the server doesn't? The server >>> should refuse the init sec context immediately. Is gssd >>> even running on the server? >> No they don't because they are not using Kerberos for NFS... > > OK, let's state clearly what's going on here: > > > The client has a host/ principal. gssd is started > automatically. > > > The server has what? > > If the server has a keytab and an nfs/ principal, > gss-proxy should be running, and there are no delays. > > If the server has a keytab and no nfs/ principal, > gss-proxy should be running, and any init sec > context should fail immediately. There should be no > delay. (If there is a delay, that needs to be > troubleshot). > > If the server does not have a keytab, gss-proxy will > not be running, and NFSv4 clients will have to sense > this. It takes a moment for each sniff. Otherwise, > there's no operational difference. > > > I'm assuming then that the problem is that Kerberos > is not set up on the _server_. Can you confirm this? > > Also, this negotiation should be done only during > the first contact of each server after a client > reboot, thus the delay happens only during the first > mount, not during subsequent ones. Can that also be > confirmed? > > >> So I guess this is what we are saying: >> >> If you what to used Kerberos for anything at all, >> they must configure it for NFS for their clients >> to work properly... I'm not sure we really want to >> say this. > > Well, the clients are working properly without the > server principal in place. They just have an extra > delay at mount time. (you yourself pointed out in > an earlier e-mail that the client is doing everything > correctly, and no mention has been made of any other > operational issue). > > We should encourage customers to set up in the most > secure way possible. In this case: > > - Kerberos is already available in the environment > > - It's not _required_ only _recommended_ (clients can > still use sec=sys without it) for the server to > enable Kerberos, but it's a best practice > > I'm guessing that if gssd and gss-proxy are running on > the server all the time, even when there is no keytab, > that delay should go away for everyone. So: > > - Always run a gssd service on servers that export NFSv4 > (I assume this will address the delay problem) > > - Recommend the NFS server be provisioned with an nfs/ > principal, and explicitly specify sec=sys on exports > to prevent clients from negotiating an unwanted Kerberos > security setting > > I far prefer these fixes to adding another administrative > setting on the client. It encourages better security, and > it addresses the problem for all NFS clients that might > want to try using Kerberos against Linux NFS servers, for > whatever reason. > > >>> Suppose there are a thousand clients and one broken >>> server. An administrator would fix that one server by >>> adding an extra service principal, rather than log >>> into a thousand clients to change a setting on each. >>> >>> Suppose your client wants both sys and krb5 mounts of >>> a group of servers, and some are "misconfigured." >>> You have to enable gssd on the client but there are still >>> delays on the sec=sys mounts. >> In both these cases you are assuming Kerberos mounts >> are being used and so Kerberos should be configured >> for NFS. That is just not the case. > > My assumption is that administrators would prefer automatic > client set up, and good security by default. > > There's no way to know in advance whether an administrator > will want sec=sys and sec=krb5 mounts on the same system. > /etc/fstab can be changed at any time, mounts can be done > by hand, or the administrator can add or remove principals > from /etc/krb5.keytab. > > Our clients have to work when there are just sec=sys > mounts, or when there are sec=sys and sec=krb5 mounts. > They must allow on-demand configuration of sec=krb5. They > must attempt to provide the best possible level of security > at all times. > > The out-of-the-shrinkwrap configuration must assume a mix > of capabilities. > > >>> In fact, I think that's going to be pretty common. Why add >>> an NFS service principal on a client if you don't expect >>> to use sec=krb5 some of the time? >> In that case adding the principal does make sense. But... >> >> Why *must* you add a principal when you know only sec=sys >> mounts will be used? > > Explained in detail above (and this is only for NFSv4, and > is not at all a _must_). But in summary: > > A client will attempt to use Kerberos for NFSv4 sec=sys when > there is a host/ or nfs/ principal in its keytab. That needs > to be documented. > > Our _recommendation_ is that the server be provisioned with > an nfs/ principal as well when NFSv4 is used in an environment > where Kerberos is present. This eliminates a costly per-mount > security negotiation, and enables cryptographically strong > authentication of each client that mounts that server. NFSv4 > sec=sys works properly otherwise without this principal. > > > -- > Chuck Lever > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/28/2016 12:27 PM, Chuck Lever wrote: > >> On Jun 28, 2016, at 10:27 AM, Steve Dickson <SteveD@redhat.com> wrote: >> >> Again, sorry for the delay... That darn flux capacitor broke... again!!! :-) >> >> On 06/23/2016 09:30 PM, Chuck Lever wrote: >>> >>>> On Jun 23, 2016, at 11:57 AM, Steve Dickson <SteveD@redhat.com> wrote: >> >> [snip] >> >>>> the key tab does have a nfs/hosname@REALM entry. So the >>>> call to the KDC is probably failing... which >>>> could be construed as a misconfiguration, but >>>> that misconfiguration should not even come into >>>> play with sec=sys mounts... IMHO... >>> >>> I disagree, of course. sec=sys means the client is not going >>> to use Kerberos to authenticate individual user requests, >>> and users don't need a Kerberos ticket to access their files. >>> That's still the case. >>> >>> I'm not aware of any promise that sec=sys means there is >>> no Kerberos within 50 miles of that mount. >> I think that's is the assumption... No Kerberos will be >> needed for sec=sys mounts. Its not when Kerberos is >> not configured. > > NFSv3 sec=sys happens to mean that no Kerberos is needed. > This hasn't changed either. > > NFSv4 sec=sys is different. Just like NFSv4 ACLs, and > NFSv4 ID mapping, and NFSv4 locking, and so on. > > Note though that Kerberos isn't needed for NFSv4 sec=sys > even when there is a keytab. The client negotiates and > operates without it. If there is a keytab... there will be rpc.gssd runnning which will cause an upcall... and the negotiation starts with krb5i.. So yes its not needed but it will be tried. > > >>> If there are valid keytabs on both systems, they need to >>> be set up correctly. If there's a misconfiguration, then >>> gssd needs to report it precisely instead of time out. >>> And it's just as easy to add a service principal to a keytab >>> as it is to disable a systemd service in that case. >> I think its more straightforward to disable a service >> that is not needed than to have to add a principal to a >> keytab for a service that's not being used or needed. > > IMO automating NFS setup so that it chooses the most > secure possible settings without intervention is the > best possible solution. Sure... now back to the point. ;-) >>> >>> The problem is not fixed by disabling gssd, it's just >>> hidden in some cases. >> I agree this %100... All I'm saying there should be a >> way to disable it when the daemon is not needed or used. > > NFSv4 sec=sys *does* use Kerberos, when it is available. > It has for years. Right... lets define "available" when rpc.gssd is running. When rpc.gssd is not running Kerberos is not available. > > Documentation should be updated to state that if Kerberos > is configured on clients, they will attempt to use it to > manage some operations that are common to all NFSv4 mount > points on that client, even when a mount point uses sec=sys. > > Kerberos will be used for user authentication only if the > client administrator has not specified a sec= setting, but > the server export allows the use of Kerberos; or if the > client administrator has specified a sec=krb5, sec=krb5i, > or sec=krb5p setting. > > The reason for using Kerberos for common operations is > that a client may have just one lease management principal. > If the client uses sec=sys and sec=krb5 mounts, and the > sec=sys mount is done first, then lease management would use > sys as well. The client cannot change this principal after > it has established a lease and files are open. > > A subsequent sec=krb5 mount will also use sec=sys for > lease management. This will be surprising and insecure > behavior. Therefore, all mounts from this client attempt > to set up a krb5 lease management transport. > > The server should have an nfs/ service principal. It > doesn't _require_ one, but it's a best practice to have > one in place. Yeah our documentation is lacking in this area... > > Administrators that have Kerberos available should use > it. There's no overhead to enabling it on NFS servers, > as long as the list of security flavors the server > returns for each export does not include Kerberos > flavors. Admins are going to do what they want to no matter what we say... IMHO... > > >> Having it automatically started just because there is a >> keytab, at first, I thought was a good idea, now >> it turns not people really don't what miscellaneous >> daemons running. Case in point gssproxy... Automatically >> comes but there is a way to disable it. With rpc.gssd >> there is not (easily). > > There are good reasons to disable daemons: > > - The daemon consumes a lot of resources. > - The daemon exposes an attack surface. > > gssd does neither. How about not needed? no rpc.gssd.. no upcall... no problem... ;-) > > There are good reasons not to disable daemons: I'm assuming you meant "to disable" or "not to enable" here. > > - It enables simpler administration. > - It keeps the test matrix narrow (because you > have to test just one configuration, not > multiple ones: gssd enabled, gssd disabled, > and so on). > > Always enabling gssd provides both of these benefits. This is a production environment so there is no testing but simpler admin is never a bad thing. > > >>>> This patch just tweaking that method to make things easier. >>> >>> It makes one thing easier, and other things more difficult. >>> As a community, I thought our goal was to make Kerberos >>> easier to use, not easier to turn off. >> Again I can't agree with you more! But this is the case >> were Kerberos is *not* being used for NFS... we should >> make that case work as well... > > Agreed. > > But NFSv4 sec=sys *does* use Kerberos when Kerberos is > configured on the system. It's a fact, and we now need to > make it convenient and natural and bug-free. The choice is > between increasing security and just making it work, or > adding one more knob that administrators have to Google for. If they do not want use Kerberos for NFS, whether is a good idea or not, we can not force them to... Or can we? > > >>>> To address your concern about covering up a bug. I just don't >>>> see it... The code is doing exactly what its asked to do. >>>> By default the kernel asks krb5i context (when rpc.gssd >>>> is run). rpc.gssd looking for a principle in the key tab, >>>> when found the KDC is called... >>>> >>>> Everything is working just like it should and it is >>>> failing just like it should. I'm just trying to >>>> eliminate all this process when not needed, in >>>> an easier way.. >>> >>> I'm not even sure now what the use case is. The client has >>> proper principals, but the server doesn't? The server >>> should refuse the init sec context immediately. Is gssd >>> even running on the server? >> No they don't because they are not using Kerberos for NFS... > > OK, let's state clearly what's going on here: > > > The client has a host/ principal. gssd is started > automatically. > > > The server has what? No info on the server other than its Linux and the nfs server is running. > > If the server has a keytab and an nfs/ principal, > gss-proxy should be running, and there are no delays. In my testing when gss-proxy is not runnning the mount hangs. > > If the server has a keytab and no nfs/ principal, > gss-proxy should be running, and any init sec > context should fail immediately. There should be no > delay. (If there is a delay, that needs to be > troubleshot). > > If the server does not have a keytab, gss-proxy will > not be running, and NFSv4 clients will have to sense > this. It takes a moment for each sniff. Otherwise, > there's no operational difference. > > > I'm assuming then that the problem is that Kerberos > is not set up on the _server_. Can you confirm this? I'll try... but we should have to force people to set up Kerberos on server they are not going to use. > > Also, this negotiation should be done only during > the first contact of each server after a client > reboot, thus the delay happens only during the first > mount, not during subsequent ones. Can that also be > confirmed? It appears it happen on all of them. > > >> So I guess this is what we are saying: >> >> If you what to used Kerberos for anything at all, >> they must configure it for NFS for their clients >> to work properly... I'm not sure we really want to >> say this. > > Well, the clients are working properly without the > server principal in place. They just have an extra > delay at mount time. (you yourself pointed out in > an earlier e-mail that the client is doing everything > correctly, and no mention has been made of any other > operational issue). This appeared to be the case. > > We should encourage customers to set up in the most > secure way possible. In this case: > > - Kerberos is already available in the environment > > - It's not _required_ only _recommended_ (clients can > still use sec=sys without it) for the server to > enable Kerberos, but it's a best practice > > I'm guessing that if gssd and gss-proxy are running on > the server all the time, even when there is no keytab, > that delay should go away for everyone. So: > > - Always run a gssd service on servers that export NFSv4 > (I assume this will address the delay problem) > > - Recommend the NFS server be provisioned with an nfs/ > principal, and explicitly specify sec=sys on exports > to prevent clients from negotiating an unwanted Kerberos > security setting Or don't start rpc.gssd... ;-) > > I far prefer these fixes to adding another administrative > setting on the client. It encourages better security, and > it addresses the problem for all NFS clients that might > want to try using Kerberos against Linux NFS servers, for > whatever reason. As you say we can only recommend... If they don't now want to use secure mounts in a Kerberos environment we should not make them, is all I'm saying. > > >>> Suppose there are a thousand clients and one broken >>> server. An administrator would fix that one server by >>> adding an extra service principal, rather than log >>> into a thousand clients to change a setting on each. >>> >>> Suppose your client wants both sys and krb5 mounts of >>> a group of servers, and some are "misconfigured." >>> You have to enable gssd on the client but there are still >>> delays on the sec=sys mounts. >> In both these cases you are assuming Kerberos mounts >> are being used and so Kerberos should be configured >> for NFS. That is just not the case. > > My assumption is that administrators would prefer automatic > client set up, and good security by default. I don't think we can make any assumption what admins want. They want strong security, but not with NFS... That's their choice, not ours. > > There's no way to know in advance whether an administrator > will want sec=sys and sec=krb5 mounts on the same system. > /etc/fstab can be changed at any time, mounts can be done > by hand, or the administrator can add or remove principals > from /etc/krb5.keytab. > > Our clients have to work when there are just sec=sys > mounts, or when there are sec=sys and sec=krb5 mounts. > They must allow on-demand configuration of sec=krb5. They > must attempt to provide the best possible level of security > at all times. > > The out-of-the-shrinkwrap configuration must assume a mix > of capabilities. I agree... And they are... But if they know for a fact, that their client(s) will never want to use secure mount, which I'm sure there a few out there, I see no problem in not starting a service they well never use. > > >>> In fact, I think that's going to be pretty common. Why add >>> an NFS service principal on a client if you don't expect >>> to use sec=krb5 some of the time? >> In that case adding the principal does make sense. But... >> >> Why *must* you add a principal when you know only sec=sys >> mounts will be used? > > Explained in detail above (and this is only for NFSv4, and > is not at all a _must_). But in summary: > > A client will attempt to use Kerberos for NFSv4 sec=sys when > there is a host/ or nfs/ principal in its keytab. That needs > to be documented. > > Our _recommendation_ is that the server be provisioned with > an nfs/ principal as well when NFSv4 is used in an environment > where Kerberos is present. This eliminates a costly per-mount > security negotiation, and enables cryptographically strong > authentication of each client that mounts that server. NFSv4 > sec=sys works properly otherwise without this principal. The was beautifully said... and I agree with all... But customer is going to turn around and tell me to go pound sand... Because they are not about to touching their server!!! :-) Esp when all they have to do is disable a service on the client where the hang is occurring. steved. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/28/2016 01:23 PM, Weston Andros Adamson wrote: > >> On Jun 28, 2016, at 12:27 PM, Chuck Lever <chuck.lever@oracle.com> wrote: >> >>> >>> On Jun 28, 2016, at 10:27 AM, Steve Dickson <SteveD@redhat.com> wrote: >>> >>> Again, sorry for the delay... That darn flux capacitor broke... again!!! :-) >>> >>> On 06/23/2016 09:30 PM, Chuck Lever wrote: >>>> >>>>> On Jun 23, 2016, at 11:57 AM, Steve Dickson <SteveD@redhat.com> wrote: >>> >>> [snip] >>> >>>>> the key tab does have a nfs/hosname@REALM entry. So the >>>>> call to the KDC is probably failing... which >>>>> could be construed as a misconfiguration, but >>>>> that misconfiguration should not even come into >>>>> play with sec=sys mounts... IMHO... >>>> >>>> I disagree, of course. sec=sys means the client is not going >>>> to use Kerberos to authenticate individual user requests, >>>> and users don't need a Kerberos ticket to access their files. >>>> That's still the case. >>>> >>>> I'm not aware of any promise that sec=sys means there is >>>> no Kerberos within 50 miles of that mount. >>> I think that's is the assumption... No Kerberos will be >>> needed for sec=sys mounts. Its not when Kerberos is >>> not configured. >> >> NFSv3 sec=sys happens to mean that no Kerberos is needed. >> This hasn't changed either. >> >> NFSv4 sec=sys is different. Just like NFSv4 ACLs, and >> NFSv4 ID mapping, and NFSv4 locking, and so on. >> >> Note though that Kerberos isn't needed for NFSv4 sec=sys >> even when there is a keytab. The client negotiates and >> operates without it. >> >> >>>> If there are valid keytabs on both systems, they need to >>>> be set up correctly. If there's a misconfiguration, then >>>> gssd needs to report it precisely instead of time out. >>>> And it's just as easy to add a service principal to a keytab >>>> as it is to disable a systemd service in that case. >>> I think its more straightforward to disable a service >>> that is not needed than to have to add a principal to a >>> keytab for a service that's not being used or needed. >> >> IMO automating NFS setup so that it chooses the most >> secure possible settings without intervention is the >> best possible solution. >> >> >>>>>> Is gssd waiting for syslog or something? >>>>> No... its just failing to get the machine creds for root >>>> >>>> Clearly more is going on than that, and so far we have only >>>> some speculation. Can you provide an strace of rpc.gssd or >>>> a network capture so we can confirm what's going on? >>> Yes... Yes... and Yes.. I added you to the bz... >> >> Thanks! I'll have a look at it. >> >> >>>>> [snip] >>>>> >>>>>>> Which does work and will still work... but I'm thinking it is >>>>>>> much similar to disable the service via systemd command >>>>>>> systemctl disable rpc-gssd >>>>>>> >>>>>>> than creating and editing those .conf files. >>>>>> >>>>>> This should all be automatic, IMO. >>>>>> >>>>>> On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5 >>>>>> to your mounts. No reboot, nothing to restart. Linux should be >>>>>> that simple. >>>>> The only extra step with Linux is to 'sysctmctl start rpc-gssd' >>>>> I don't there is much would can do about that.... >>>> >>>> Sure there is. Leave gssd running, and make sure it can respond >>>> quickly in every reasonable case. :-p >>>> >>>> >>>>> But of >>>>> course... Patches are always welcomed!! 8-) >>>>> >>>>> TBL... When kerberos is configured correctly for NFS everything >>>>> works just fine. When kerberos is configured, but not for NFS, >>>>> causes delays on all NFS mounts. >>>> >>>> This convinces me even more that there is a gssd issue here. >>>> >>>> >>>>> Today, there is a method to stop rpc-gssd from blindly starting >>>>> when kerberos is configured to eliminate that delay. >>>> >>>> I can fix my broken TV by not turning it on, and I don't >>>> notice the problem. But the problem is still there any >>>> time I want to watch TV. >>>> >>>> The problem is not fixed by disabling gssd, it's just >>>> hidden in some cases. >>> I agree this %100... All I'm saying there should be a >>> way to disable it when the daemon is not needed or used. >> >> NFSv4 sec=sys *does* use Kerberos, when it is available. >> It has for years. >> >> Documentation should be updated to state that if Kerberos >> is configured on clients, they will attempt to use it to >> manage some operations that are common to all NFSv4 mount >> points on that client, even when a mount point uses sec=sys. >> >> Kerberos will be used for user authentication only if the >> client administrator has not specified a sec= setting, but >> the server export allows the use of Kerberos; or if the >> client administrator has specified a sec=krb5, sec=krb5i, >> or sec=krb5p setting. >> >> The reason for using Kerberos for common operations is >> that a client may have just one lease management principal. >> If the client uses sec=sys and sec=krb5 mounts, and the >> sec=sys mount is done first, then lease management would use >> sys as well. The client cannot change this principal after >> it has established a lease and files are open. >> >> A subsequent sec=krb5 mount will also use sec=sys for >> lease management. This will be surprising and insecure >> behavior. Therefore, all mounts from this client attempt >> to set up a krb5 lease management transport. > > Chuck, > > Thanks for explaining this so well! This definitely should make > it’s way into documentation - we should have added something > like this a long time ago. I agree... where should it go? the mount.nfs man page?? steved. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Jun 28, 2016, at 2:12 PM, Steve Dickson <SteveD@redhat.com> wrote: > > > >> On 06/28/2016 01:23 PM, Weston Andros Adamson wrote: >> >>>> On Jun 28, 2016, at 12:27 PM, Chuck Lever <chuck.lever@oracle.com> wrote: >>>> >>>> >>>> On Jun 28, 2016, at 10:27 AM, Steve Dickson <SteveD@redhat.com> wrote: >>>> >>>> Again, sorry for the delay... That darn flux capacitor broke... again!!! :-) >>>> >>>>> On 06/23/2016 09:30 PM, Chuck Lever wrote: >>>>> >>>>>> On Jun 23, 2016, at 11:57 AM, Steve Dickson <SteveD@redhat.com> wrote: >>>> >>>> [snip] >>>> >>>>>> the key tab does have a nfs/hosname@REALM entry. So the >>>>>> call to the KDC is probably failing... which >>>>>> could be construed as a misconfiguration, but >>>>>> that misconfiguration should not even come into >>>>>> play with sec=sys mounts... IMHO... >>>>> >>>>> I disagree, of course. sec=sys means the client is not going >>>>> to use Kerberos to authenticate individual user requests, >>>>> and users don't need a Kerberos ticket to access their files. >>>>> That's still the case. >>>>> >>>>> I'm not aware of any promise that sec=sys means there is >>>>> no Kerberos within 50 miles of that mount. >>>> I think that's is the assumption... No Kerberos will be >>>> needed for sec=sys mounts. Its not when Kerberos is >>>> not configured. >>> >>> NFSv3 sec=sys happens to mean that no Kerberos is needed. >>> This hasn't changed either. >>> >>> NFSv4 sec=sys is different. Just like NFSv4 ACLs, and >>> NFSv4 ID mapping, and NFSv4 locking, and so on. >>> >>> Note though that Kerberos isn't needed for NFSv4 sec=sys >>> even when there is a keytab. The client negotiates and >>> operates without it. >>> >>> >>>>> If there are valid keytabs on both systems, they need to >>>>> be set up correctly. If there's a misconfiguration, then >>>>> gssd needs to report it precisely instead of time out. >>>>> And it's just as easy to add a service principal to a keytab >>>>> as it is to disable a systemd service in that case. >>>> I think its more straightforward to disable a service >>>> that is not needed than to have to add a principal to a >>>> keytab for a service that's not being used or needed. >>> >>> IMO automating NFS setup so that it chooses the most >>> secure possible settings without intervention is the >>> best possible solution. >>> >>> >>>>>>> Is gssd waiting for syslog or something? >>>>>> No... its just failing to get the machine creds for root >>>>> >>>>> Clearly more is going on than that, and so far we have only >>>>> some speculation. Can you provide an strace of rpc.gssd or >>>>> a network capture so we can confirm what's going on? >>>> Yes... Yes... and Yes.. I added you to the bz... >>> >>> Thanks! I'll have a look at it. >>> >>> >>>>>> [snip] >>>>>> >>>>>>>> Which does work and will still work... but I'm thinking it is >>>>>>>> much similar to disable the service via systemd command >>>>>>>> systemctl disable rpc-gssd >>>>>>>> >>>>>>>> than creating and editing those .conf files. >>>>>>> >>>>>>> This should all be automatic, IMO. >>>>>>> >>>>>>> On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5 >>>>>>> to your mounts. No reboot, nothing to restart. Linux should be >>>>>>> that simple. >>>>>> The only extra step with Linux is to 'sysctmctl start rpc-gssd' >>>>>> I don't there is much would can do about that.... >>>>> >>>>> Sure there is. Leave gssd running, and make sure it can respond >>>>> quickly in every reasonable case. :-p >>>>> >>>>> >>>>>> But of >>>>>> course... Patches are always welcomed!! 8-) >>>>>> >>>>>> TBL... When kerberos is configured correctly for NFS everything >>>>>> works just fine. When kerberos is configured, but not for NFS, >>>>>> causes delays on all NFS mounts. >>>>> >>>>> This convinces me even more that there is a gssd issue here. >>>>> >>>>> >>>>>> Today, there is a method to stop rpc-gssd from blindly starting >>>>>> when kerberos is configured to eliminate that delay. >>>>> >>>>> I can fix my broken TV by not turning it on, and I don't >>>>> notice the problem. But the problem is still there any >>>>> time I want to watch TV. >>>>> >>>>> The problem is not fixed by disabling gssd, it's just >>>>> hidden in some cases. >>>> I agree this %100... All I'm saying there should be a >>>> way to disable it when the daemon is not needed or used. >>> >>> NFSv4 sec=sys *does* use Kerberos, when it is available. >>> It has for years. >>> >>> Documentation should be updated to state that if Kerberos >>> is configured on clients, they will attempt to use it to >>> manage some operations that are common to all NFSv4 mount >>> points on that client, even when a mount point uses sec=sys. >>> >>> Kerberos will be used for user authentication only if the >>> client administrator has not specified a sec= setting, but >>> the server export allows the use of Kerberos; or if the >>> client administrator has specified a sec=krb5, sec=krb5i, >>> or sec=krb5p setting. >>> >>> The reason for using Kerberos for common operations is >>> that a client may have just one lease management principal. >>> If the client uses sec=sys and sec=krb5 mounts, and the >>> sec=sys mount is done first, then lease management would use >>> sys as well. The client cannot change this principal after >>> it has established a lease and files are open. >>> >>> A subsequent sec=krb5 mount will also use sec=sys for >>> lease management. This will be surprising and insecure >>> behavior. Therefore, all mounts from this client attempt >>> to set up a krb5 lease management transport. >> >> Chuck, >> >> Thanks for explaining this so well! This definitely should make >> it’s way into documentation - we should have added something >> like this a long time ago. > I agree... where should it go? the mount.nfs man page?? nfs(5) is where this kind of thing typically goes. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Jun 28, 2016, at 2:11 PM, Steve Dickson <SteveD@redhat.com> wrote: > > > >> On 06/28/2016 12:27 PM, Chuck Lever wrote: >> >>> On Jun 28, 2016, at 10:27 AM, Steve Dickson <SteveD@redhat.com> wrote: >>> >>> Again, sorry for the delay... That darn flux capacitor broke... again!!! :-) >>> >>>> On 06/23/2016 09:30 PM, Chuck Lever wrote: >>>> >>>>> On Jun 23, 2016, at 11:57 AM, Steve Dickson <SteveD@redhat.com> wrote: >>> >>> [snip] >>> >>>>> the key tab does have a nfs/hosname@REALM entry. So the >>>>> call to the KDC is probably failing... which >>>>> could be construed as a misconfiguration, but >>>>> that misconfiguration should not even come into >>>>> play with sec=sys mounts... IMHO... >>>> >>>> I disagree, of course. sec=sys means the client is not going >>>> to use Kerberos to authenticate individual user requests, >>>> and users don't need a Kerberos ticket to access their files. >>>> That's still the case. >>>> >>>> I'm not aware of any promise that sec=sys means there is >>>> no Kerberos within 50 miles of that mount. >>> I think that's is the assumption... No Kerberos will be >>> needed for sec=sys mounts. Its not when Kerberos is >>> not configured. >> >> NFSv3 sec=sys happens to mean that no Kerberos is needed. >> This hasn't changed either. >> >> NFSv4 sec=sys is different. Just like NFSv4 ACLs, and >> NFSv4 ID mapping, and NFSv4 locking, and so on. >> >> Note though that Kerberos isn't needed for NFSv4 sec=sys >> even when there is a keytab. The client negotiates and >> operates without it. > If there is a keytab... there will be rpc.gssd runnning > which will cause an upcall... and the negotiation starts > with krb5i.. So yes its not needed but it will be tried. > >> >> >>>> If there are valid keytabs on both systems, they need to >>>> be set up correctly. If there's a misconfiguration, then >>>> gssd needs to report it precisely instead of time out. >>>> And it's just as easy to add a service principal to a keytab >>>> as it is to disable a systemd service in that case. >>> I think its more straightforward to disable a service >>> that is not needed than to have to add a principal to a >>> keytab for a service that's not being used or needed. >> >> IMO automating NFS setup so that it chooses the most >> secure possible settings without intervention is the >> best possible solution. > Sure... now back to the point. ;-) > > >>>> >>>> The problem is not fixed by disabling gssd, it's just >>>> hidden in some cases. >>> I agree this %100... All I'm saying there should be a >>> way to disable it when the daemon is not needed or used. >> >> NFSv4 sec=sys *does* use Kerberos, when it is available. >> It has for years. > Right... lets define "available" when rpc.gssd is running. > When rpc.gssd is not running Kerberos is not available. OK, but now whenever a change to the Kerberos configuration on the host is made (the keytab is created or destroyed, or a principal is added or removed from the keytab), an extra step is needed to ensure secure NFS is working properly. Should we go farther and say that, if there happen to be no sec=krb5[ip] mounts on the system, gssd should be shut down? I mean, it's not being used, so let's turn it off! There is a host/ principal in the keytab. That means Kerberos is active on that system, and gssd can use it. That means it is possible that an administrator (or automounter) may specify sec=krb5 at some point during the life of this client. For me that means gssd should be running on this system. Another way to achieve your goal is to add a command line option to gssd which specifies which principal in the local keytab to use as the machine credential. Specify a principal that is not in the keytab, and gssd should do no negotiation at all, and will return immediately. There may already be a command line option to do this (I'm not at liberty to confirm my memory at the moment). That would be an immediate solution for this customer, if provisioning an nfs/ service principal on their server is still anathema, and no other code change is needed. >> Documentation should be updated to state that if Kerberos >> is configured on clients, they will attempt to use it to >> manage some operations that are common to all NFSv4 mount >> points on that client, even when a mount point uses sec=sys. >> >> Kerberos will be used for user authentication only if the >> client administrator has not specified a sec= setting, but >> the server export allows the use of Kerberos; or if the >> client administrator has specified a sec=krb5, sec=krb5i, >> or sec=krb5p setting. >> >> The reason for using Kerberos for common operations is >> that a client may have just one lease management principal. >> If the client uses sec=sys and sec=krb5 mounts, and the >> sec=sys mount is done first, then lease management would use >> sys as well. The client cannot change this principal after >> it has established a lease and files are open. >> >> A subsequent sec=krb5 mount will also use sec=sys for >> lease management. This will be surprising and insecure >> behavior. Therefore, all mounts from this client attempt >> to set up a krb5 lease management transport. >> >> The server should have an nfs/ service principal. It >> doesn't _require_ one, but it's a best practice to have >> one in place. > Yeah our documentation is lacking in this area... > >> >> Administrators that have Kerberos available should use >> it. There's no overhead to enabling it on NFS servers, >> as long as the list of security flavors the server >> returns for each export does not include Kerberos >> flavors. > Admins are going to do what they want to no matter > what we say... IMHO... > >> >> >>> Having it automatically started just because there is a >>> keytab, at first, I thought was a good idea, now >>> it turns not people really don't what miscellaneous >>> daemons running. Case in point gssproxy... Automatically >>> comes but there is a way to disable it. With rpc.gssd >>> there is not (easily). >> >> There are good reasons to disable daemons: >> >> - The daemon consumes a lot of resources. >> - The daemon exposes an attack surface. >> >> gssd does neither. > How about not needed? no rpc.gssd.. no upcall... no problem... ;-) >> There are good reasons not to disable daemons: > I'm assuming you meant "to disable" or "not to enable" here. No, I meant exactly what I wrote. Let's rewrite it "good reasons to leave a daemon enabled" >> - It enables simpler administration. >> - It keeps the test matrix narrow (because you >> have to test just one configuration, not >> multiple ones: gssd enabled, gssd disabled, >> and so on). >> >> Always enabling gssd provides both of these benefits. > This is a production environment so there is no testing I meant QA testing by the distributor. Without the extra knob, the QA tester has to test only the configuration where gssd is enabled. Whenever you add a knob like this, you have to double your QA test matrix. > but simpler admin is never a bad thing. > >> >> >>>>> This patch just tweaking that method to make things easier. >>>> >>>> It makes one thing easier, and other things more difficult. >>>> As a community, I thought our goal was to make Kerberos >>>> easier to use, not easier to turn off. >>> Again I can't agree with you more! But this is the case >>> were Kerberos is *not* being used for NFS... we should >>> make that case work as well... >> >> Agreed. >> >> But NFSv4 sec=sys *does* use Kerberos when Kerberos is >> configured on the system. It's a fact, and we now need to >> make it convenient and natural and bug-free. The choice is >> between increasing security and just making it work, or >> adding one more knob that administrators have to Google for. > If they do not want use Kerberos for NFS, whether is a good > idea or not, we can not force them to... Or can we? No-one is forcing anyone to do anything. >>>>> To address your concern about covering up a bug. I just don't >>>>> see it... The code is doing exactly what its asked to do. >>>>> By default the kernel asks krb5i context (when rpc.gssd >>>>> is run). rpc.gssd looking for a principle in the key tab, >>>>> when found the KDC is called... >>>>> >>>>> Everything is working just like it should and it is >>>>> failing just like it should. I'm just trying to >>>>> eliminate all this process when not needed, in >>>>> an easier way.. >>>> >>>> I'm not even sure now what the use case is. The client has >>>> proper principals, but the server doesn't? The server >>>> should refuse the init sec context immediately. Is gssd >>>> even running on the server? >>> No they don't because they are not using Kerberos for NFS... >> >> OK, let's state clearly what's going on here: >> >> >> The client has a host/ principal. gssd is started >> automatically. >> >> >> The server has what? > No info on the server other than its Linux and the > nfs server is running. > >> >> If the server has a keytab and an nfs/ principal, >> gss-proxy should be running, and there are no delays. > In my testing when gss-proxy is not runnning the mount > hangs. > >> >> If the server has a keytab and no nfs/ principal, >> gss-proxy should be running, and any init sec >> context should fail immediately. There should be no >> delay. (If there is a delay, that needs to be >> troubleshot). >> >> If the server does not have a keytab, gss-proxy will >> not be running, and NFSv4 clients will have to sense >> this. It takes a moment for each sniff. Otherwise, >> there's no operational difference. >> >> >> I'm assuming then that the problem is that Kerberos >> is not set up on the _server_. Can you confirm this? > I'll try... but we should have to force people to > set up Kerberos on server they are not going to use. I say one more time: no-one is forcing anyone to do anything. >> Also, this negotiation should be done only during >> the first contact of each server after a client >> reboot, thus the delay happens only during the first >> mount, not during subsequent ones. Can that also be >> confirmed? > It appears it happen on all of them. Can this customer's observed behavior be reproduced in vitro? Seems like there are many unknowns here, and it would make sense to get more answers before proposing a long-term change to our administrative interfaces. >>> So I guess this is what we are saying: >>> >>> If you what to used Kerberos for anything at all, >>> they must configure it for NFS for their clients >>> to work properly... I'm not sure we really want to >>> say this. >> >> Well, the clients are working properly without the >> server principal in place. They just have an extra >> delay at mount time. (you yourself pointed out in >> an earlier e-mail that the client is doing everything >> correctly, and no mention has been made of any other >> operational issue). > This appeared to be the case. > >> >> We should encourage customers to set up in the most >> secure way possible. In this case: >> >> - Kerberos is already available in the environment >> >> - It's not _required_ only _recommended_ (clients can >> still use sec=sys without it) for the server to >> enable Kerberos, but it's a best practice >> >> I'm guessing that if gssd and gss-proxy are running on >> the server all the time, even when there is no keytab, >> that delay should go away for everyone. So: >> >> - Always run a gssd service on servers that export NFSv4 >> (I assume this will address the delay problem) >> >> - Recommend the NFS server be provisioned with an nfs/ >> principal, and explicitly specify sec=sys on exports >> to prevent clients from negotiating an unwanted Kerberos >> security setting > Or don't start rpc.gssd... ;-) > >> >> I far prefer these fixes to adding another administrative >> setting on the client. It encourages better security, and >> it addresses the problem for all NFS clients that might >> want to try using Kerberos against Linux NFS servers, for >> whatever reason. > As you say we can only recommend... If they don't > now want to use secure mounts in a Kerberos environment > we should not make them, is all I'm saying. I don't see that I'm proposing otherwise. I've simply described the recommended best practice. NFSv4 sec=sys works fine with or without Kerberos present. However, if there is a KDC available, and the client is provisioned with a host/ principal, we recommend adding an nfs/ service principal to the NFS server. sec=sys still works in the absence of said principal. How is that forcing anything? In the specific case for your customer, it's simply not clear why the delays occur. More information is needed before it makes sense to propose a code change. >>>> Suppose there are a thousand clients and one broken >>>> server. An administrator would fix that one server by >>>> adding an extra service principal, rather than log >>>> into a thousand clients to change a setting on each. >>>> >>>> Suppose your client wants both sys and krb5 mounts of >>>> a group of servers, and some are "misconfigured." >>>> You have to enable gssd on the client but there are still >>>> delays on the sec=sys mounts. >>> In both these cases you are assuming Kerberos mounts >>> are being used and so Kerberos should be configured >>> for NFS. That is just not the case. >> >> My assumption is that administrators would prefer automatic >> client set up, and good security by default. > I don't think we can make any assumption what admins want. > They want strong security, but not with NFS... That's > their choice, not ours. >> There's no way to know in advance whether an administrator >> will want sec=sys and sec=krb5 mounts on the same system. >> /etc/fstab can be changed at any time, mounts can be done >> by hand, or the administrator can add or remove principals >> from /etc/krb5.keytab. >> >> Our clients have to work when there are just sec=sys >> mounts, or when there are sec=sys and sec=krb5 mounts. >> They must allow on-demand configuration of sec=krb5. They >> must attempt to provide the best possible level of security >> at all times. >> >> The out-of-the-shrinkwrap configuration must assume a mix >> of capabilities. > I agree... And they are... But if they know for a fact, that > their client(s) will never want to use secure mount, which > I'm sure there a few out there, I see no problem in > not starting a service they well never use. Why "force" an admin to worry about whether some random service is running or not? IMO the mechanism (one or more daemons, a systemctl service, the use of a keyring, or using The Force) should be transparent to the administrator, who should care only about security policy settings. The whole idea of having separate services for enabling NFS security is confusing IMO. The default is sec=sys, but as soon as you vary from that, things get wonky. It also makes it much harder for distributors or upstream developers to make alterations to this mechanism while not altering the administrative interfaces. I have to check whether "SECURE=YES" is uncommented in /etc/sysconfig/nfs. I have to check whether nfs.target includes nfs-secure.service. None of this is obvious or desirable, and after all is said and done I usually miss something and have to Google anyway, before a valid krb5.conf and adding "sec=krb5" works properly. And the only reason we have this complication is because someone complained once about extra daemons running. It's just superstition. Why can't it be simple for all sec= settings? >>>> In fact, I think that's going to be pretty common. Why add >>>> an NFS service principal on a client if you don't expect >>>> to use sec=krb5 some of the time? >>> In that case adding the principal does make sense. But... >>> >>> Why *must* you add a principal when you know only sec=sys >>> mounts will be used? >> >> Explained in detail above (and this is only for NFSv4, and >> is not at all a _must_). But in summary: >> >> A client will attempt to use Kerberos for NFSv4 sec=sys when >> there is a host/ or nfs/ principal in its keytab. That needs >> to be documented. >> >> Our _recommendation_ is that the server be provisioned with >> an nfs/ principal as well when NFSv4 is used in an environment >> where Kerberos is present. This eliminates a costly per-mount >> security negotiation, and enables cryptographically strong >> authentication of each client that mounts that server. NFSv4 >> sec=sys works properly otherwise without this principal. > The was beautifully said... and I agree with all... > But customer is going to turn around and tell me to go pound > sand... Because they are not about to touching their server!!! :-) What if this customer came back and said "We also want this to work with NFSv2 on UDP?" Would you still want to accommodate them? If they don't want to provision an nfs/ service principal it would be really helpful for us to know why. IMO the community should not accommodate anyone who refuses to use a best practice without a reason. Is there a reason? > Esp when all they have to do is disable a service on the client > where the hang is occurring. They could also use NFSv3. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/systemd/README b/systemd/README index 7c43df8..58dae42 100644 --- a/systemd/README +++ b/systemd/README @@ -59,13 +59,9 @@ information such as in /etc/sysconfig/nfs or /etc/defaults/nfs. It is run once by nfs-config.service. rpc.gssd and rpc.svcgssd are assumed to be needed if /etc/krb5.keytab -is present. -If a site needs this file present but does not want the gss daemons -running, it should create - /etc/systemd/system/rpc-gssd.service.d/01-disable.conf -and - /etc/systemd/system/rpc-svcgssd.service.d/01-disable.conf +is present. If a site needs this file present but does not want +the gss daemons running, they can be disabled by doing + + systemctl disable rpc-gssd + systemctl disable rpc-svcgssd -containing - [Unit] - ConditionNull=false diff --git a/systemd/rpc-gssd.service b/systemd/rpc-gssd.service index d4a3819..681f26a 100644 --- a/systemd/rpc-gssd.service +++ b/systemd/rpc-gssd.service @@ -17,3 +17,9 @@ EnvironmentFile=-/run/sysconfig/nfs-utils Type=forking ExecStart=/usr/sbin/rpc.gssd $GSSDARGS + +# Only start if the service is enabled +# and /etc/krb5.keytab exists +[Install] +WantedBy=multi-user.target + diff --git a/systemd/rpc-svcgssd.service b/systemd/rpc-svcgssd.service index 41177b6..4433ed7 100644 --- a/systemd/rpc-svcgssd.service +++ b/systemd/rpc-svcgssd.service @@ -18,3 +18,10 @@ After=nfs-config.service EnvironmentFile=-/run/sysconfig/nfs-utils Type=forking ExecStart=/usr/sbin/rpc.svcgssd $SVCGSSDARGS + +# Only start if the service is enabled +# and /etc/krb5.keytab exists +# and when gss-proxy is not runing +[Install] +WantedBy=multi-user.target +
When Kerberos is enabled, the /etc/krb5.keytab exists which causes the both gssd daemons to start, automatically. With rpc.gssd running, on all NFS mounts, an upcall is done to get GSS security context for SETCLIENTID procedure. When Kerberos is not configured for NFS, meaning there is no host/hostname@REALM principal in the key tab, those upcalls always fall causing the mount to hang for several seconds. This patch added an [Install] section to both services so the services can be enable and disable. The README was also updated. Signed-off-by: Steve Dickson <steved@redhat.com> --- systemd/README | 14 +++++--------- systemd/rpc-gssd.service | 6 ++++++ systemd/rpc-svcgssd.service | 7 +++++++ 3 files changed, 18 insertions(+), 9 deletions(-)