Message ID | 164721984672.11933.15475930163427511814@noble.neil.brown.name (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] nfs.man: document requirements for NFSv4 identity | expand |
On 13 Mar 2022, at 21:04, NeilBrown wrote: > When mounting NFS filesystem in a network namespace using v4, some > care > must be taken to ensure a unique and stable client identity. Similar > case is needed for NFS-root and other situations. > > Add documentation explaining the requirements for the NFS identity in > these situations. > > Signed-off-by: NeilBrown <neilb@suse.de> > --- > > I think I've address most of the feedback, but please forgive and > remind > if I missed something. > NeilBrown > > utils/mount/nfs.man | 109 > +++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 108 insertions(+), 1 deletion(-) > > diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man > index d9f34df36b42..5f15abe8cf72 100644 > --- a/utils/mount/nfs.man > +++ b/utils/mount/nfs.man > @@ -1,7 +1,7 @@ > .\"@(#)nfs.5" > .TH NFS 5 "9 October 2012" > .SH NAME > -nfs \- fstab format and options for the > +nfs \- fstab format and configuration for the > .B nfs > file systems > .SH SYNOPSIS > @@ -1844,6 +1844,113 @@ export pathname, but not both, during a > remount. For example, > merges the mount option > .B ro > with the mount options already saved on disk for the NFS server > mounted at /mnt. > +.SH "NFS CLIENT IDENTIFIER" > +NFSv4 requires that the client present a unique identifier to the > server > +to be used to track state such as file locks. By default Linux NFS > uses > +the host name, as configured at the time of the first NFS mount, > +together with some fixed content such as the name "Linux NFS" and the > +particular protocol version. When the hostname is guaranteed to be > +unique among all client which access the same server this is > sufficient. > +If hostname uniqueness cannot be assumed, extra identity information > +must be provided. > +.PP > +Some situations which are known to be problematic with respect to > unique > +host names include: > +.IP \- 2 > +NFS-root (diskless) clients, where the DCHP server (or equivalent) > does > +not provide a unique host name. > +.IP \- 2 > +"containers" within a single Linux host. If each container has a > separate > +network namespace, but does not use the UTS namespace to provide a > unique > +host name, then there can be multiple effective NFS clients with the > +same host name. > +.IP \= 2 > +Clients across multiple administrative domains that access a common > NFS > +server. If assignment of host name is devolved to separate domains, > +uniqueness cannot be guaranteed, unless a domain name is included in > the > +host name. > +.SS "Increasing Client Uniqueness" > +Apart from the host name, which is the preferred way to differentiate > +NFS clients, there are two mechanisms to add uniqueness to the > +client identifier. > +.TP > +.B nfs.nfs4_unique_id > +This module parameter can be set to an arbitrary string at boot time, > or > +when the > +.B nfs > +module is loaded. This might be suitable for configuring diskless > clients. > +.TP > +.B /sys/fs/nfs/client/net/identifier > +This virtual file (available since Linux 5.3) is local to the network > +name-space in which it is accessed and so can provided uniqueness > between +name-space in which it is accessed and so can provided uniqueness between +name-space in which it is accessed and so can provide uniqueness between ^ > +network namespaces (containers) when the hostname remains uniform. > +.RS > +.PP > +This value is empty on name-space creation. > +If the value is to be set, that should be done before the first > +mount. If the container system has access to some sort of > per-container > +identity then that identity, possibly obfuscated as a UUID is privacy > is +identity then that identity, possibly obfuscated as a UUID is privacy is +identity then that identity, possibly obfuscated as a UUID if privacy is ^^ > +needed, can be used. Combining the identity with the name of the > +container systems would also help. For example: > +.RS 4 > +echo "ip-netns:`ip netns identify`" \\ > +.br > + > /sys/fs/nfs/client/net/identifier > +.br > +uuidgen --sha1 --namespace @url \\ > +.br > + -N "nfs:`cat /etc/machine-id`" \\ > +.br > + > /sys/fs/nfs/client/net/identifier > +.RE > +If the container system provides no stable name, > +but does have stable storage, then something like > +.RS 4 > +[ -s /etc/nfsv4-uuid ] || uuidgen > /etc/nfsv4-uuid && > +.br > +cat /etc/nfsv4-uuid > /sys/fs/nfs/client/net/identifier > +.RE > +would suffice. > +.PP > +If a container has neither a stable name nor stable (local) storage, > +then it is not possible to provide a stable identifier, so providing > +a random identifier to ensure uniqueness would be best > +.RS 4 > +uuidgen > /sys/fs/nfs/client/net/identifier > +.RE > +.RE > +.SS Consequences of poor identity setting > +Any two concurrent clients that might access the same server must > have > +different identifiers for correct operation, and any two consecutive > +instances of the same client should have the same identifier for > optimal > +crash recovery. > +.PP > +If two different clients present the same identity to a server there > are > +two possible scenarios. If the clients use the same credential then > the > +server will treat them as the same client which appears to be > restarting > +frequently. One client may manage to open some files etc, but as > soon > +as the other client does anything the first client will lose access > and > +need to re-open everything. > +.PP > +If the clients use different credentials, then the second client to > +establish a connection to the server will be refused access. For > +.B auth=sys > +the credential is based on hostname, so will be the same if the > +identities are the same. With > +.B auth=krb > +the credential is stored in > +.I /etc/krb5.keytab > +and will be the same only if this is copied among hosts. > +.PP > +If the identity is unique but not stable, for example if it is > generated > +randomly on each start up of the NFS client, then crash recovery is > +affected. When a client shuts down uncleanly and restarts, the > server > +will normally detect this because the same identity is presented with There's ambiguity on "this", it could be the situation described in the previous sentence, how about: +will normally detect this because the same identity is presented with +will normally detect the unclean restart because the same identity is presented with Ben
Hi Neil- > On Mar 13, 2022, at 9:04 PM, NeilBrown <neilb@suse.de> wrote: > > > When mounting NFS filesystem in a network namespace using v4, some care > must be taken to ensure a unique and stable client identity. Similar > case is needed for NFS-root and other situations. > > Add documentation explaining the requirements for the NFS identity in > these situations. > > Signed-off-by: NeilBrown <neilb@suse.de> > --- > > I think I've address most of the feedback, but please forgive and remind > if I missed something. > NeilBrown > > utils/mount/nfs.man | 109 +++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 108 insertions(+), 1 deletion(-) > > diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man > index d9f34df36b42..5f15abe8cf72 100644 > --- a/utils/mount/nfs.man > +++ b/utils/mount/nfs.man > @@ -1,7 +1,7 @@ > .\"@(#)nfs.5" > .TH NFS 5 "9 October 2012" > .SH NAME > -nfs \- fstab format and options for the > +nfs \- fstab format and configuration for the > .B nfs > file systems Suggest "configuration for nfs file systems" (remove "the") > .SH SYNOPSIS > @@ -1844,6 +1844,113 @@ export pathname, but not both, during a remount. For example, > merges the mount option > .B ro > with the mount options already saved on disk for the NFS server mounted at /mnt. > +.SH "NFS CLIENT IDENTIFIER" > +NFSv4 requires that the client present a unique identifier to the server > +to be used to track state such as file locks. By default Linux NFS uses > +the host name, as configured at the time of the first NFS mount, > +together with some fixed content such as the name "Linux NFS" and the > +particular protocol version. When the hostname is guaranteed to be > +unique among all client which access the same server this is sufficient. > +If hostname uniqueness cannot be assumed, extra identity information > +must be provided. The last sentence is made ambiguous by the use of passive voice. Suggest: "When hostname uniqueness cannot be guaranteed, the client administrator must provide extra identity information." I have a problem with basing our default uniqueness guarantee on hostnames "most of the time" hoping it will all work out. There are simply too many common cases where hostname stability can't be relied upon. Our sustaining teams will happily tell us this hope hasn't so far been born out. I also don't feel that nfs(5) is an appropriate place for this level of detail. Documentation/filesystems/nfs/ is more appropriate IMO. In general, man pages are good for quick summaries, not for explainers. Here, it reads like "you, a user, are going to have to do this thing that is like filling out a tax form" -- in reality it should be information that should be: - Ignorable by most folks - Used by distributors to add value by automating set up - Used for debugging large client installations Maybe I'm just stating this to understand the purpose of this patch, but it could also be used as an "Intended audience" disclaimer in this new section. > +.PP > +Some situations which are known to be problematic with respect to unique > +host names include: A little wordy. Suggest: "Situations known to be problematic with respect to unique hostnames include:" If this will eventually become part of nfs(5), I would first run this patch by documentation experts, because they might have a preference for "hostnames" over "host names" and "namespaces" over "name-spaces". Usage of these terms throughout this patch is not consistent. > +.IP \- 2 > +NFS-root (diskless) clients, where the DCHP server (or equivalent) does > +not provide a unique host name. Suggest this addition: .IP \- 2 Dynamically-assigned hostnames, where the hostname can be changed after a client reboot, while the client is booted, or if a client often repeatedly connects to multiple networks (for example if it is moved from home to an office every day). > +.IP \- 2 > +"containers" within a single Linux host. If each container has a separate > +network namespace, but does not use the UTS namespace to provide a unique > +host name, then there can be multiple effective NFS clients with the > +same host name. > +.IP \= 2 .IP \- 2 > +Clients across multiple administrative domains that access a common NFS > +server. If assignment of host name is devolved to separate domains, I don't recognize the phrase "assignment is devolved to separate domains". Can you choose a friendlier way of saying this? > +uniqueness cannot be guaranteed, unless a domain name is included in the > +host name. > +.SS "Increasing Client Uniqueness" > +Apart from the host name, which is the preferred way to differentiate > +NFS clients, there are two mechanisms to add uniqueness to the > +client identifier. > +.TP > +.B nfs.nfs4_unique_id > +This module parameter can be set to an arbitrary string at boot time, or > +when the > +.B nfs > +module is loaded. This might be suitable for configuring diskless clients. Suggest: "This is suitable for" > +.TP > +.B /sys/fs/nfs/client/net/identifier > +This virtual file (available since Linux 5.3) is local to the network > +name-space in which it is accessed and so can provided uniqueness between > +network namespaces (containers) when the hostname remains uniform. ^provided^provide ^between^amongst and the clause at the end confused me. Suggest: "in which it is accessed and thus can provide uniqueness amongst network namespaces (containers)." > +.RS > +.PP > +This value is empty on name-space creation. > +If the value is to be set, that should be done before the first > +mount. If the container system has access to some sort of per-container > +identity then that identity, possibly obfuscated as a UUID is privacy is > +needed, can be used. Combining the identity with the name of the > +container systems would also help. I object to recommending obfuscation via a UUID. 1. This is confusing because there has been no mention of any persistence requirement so far. At this point, a reader might think that the client can simply convert the hostname and netns identifier every time it boots. However this is only OK to do if these things are guaranteed not to change during the lifetime of a client. In a world where a majority of systems get their hostnames dynamically, I think this is a shaky foundation. 2. There's no requirement that this uniquifier be in the form of a UUID anywhere in specifications, and the Linux client itself does not add such a requirement. (You suggested before that we should start by writing down requirements. Using a UUID ain't a requirement). Linux chooses to implement its uniquifer with a UUID because it is assumed we are using a random UUID (rather than a name-based or time-based UUID). A random UUID has strong global uniqueness guarantees, which guarantees the client identifier will always be unique amongst clients in nearly all situations for nearly no cost. If we want to create a good uniquifier here, then combine the hostname, netns identity, and/or the host's machine-id and then hash that blob with a known strong digest algorithm like SHA-256. A man page must not recommend the use of deprecated or insecure obfuscation mechanisms. The man page can suggest a random-based UUID as long as it states plainly that such UUIDs have global uniqueness guarantees that make them suitable for this purpose. We're using a UUID for its global uniqueness properties, not because of its appearance. > For example: > +.RS 4 > +echo "ip-netns:`ip netns identify`" \\ > +.br > + > /sys/fs/nfs/client/net/identifier > +.br > +uuidgen --sha1 --namespace @url \\ > +.br > + -N "nfs:`cat /etc/machine-id`" \\ > +.br > + > /sys/fs/nfs/client/net/identifier > +.RE > +If the container system provides no stable name, > +but does have stable storage, Here's the first mention of "stable". It needs some introduction far above. > then something like > +.RS 4 > +[ -s /etc/nfsv4-uuid ] || uuidgen > /etc/nfsv4-uuid && > +.br > +cat /etc/nfsv4-uuid > /sys/fs/nfs/client/net/identifier > +.RE > +would suffice. > +.PP > +If a container has neither a stable name nor stable (local) storage, > +then it is not possible to provide a stable identifier, so providing > +a random identifier to ensure uniqueness would be best > +.RS 4 > +uuidgen > /sys/fs/nfs/client/net/identifier > +.RE > +.RE > +.SS Consequences of poor identity setting This section provides context to understand the above technical recommendations. I suggest this whole section should be moved to near the opening paragraph. > +Any two concurrent clients that might access the same server must have > +different identifiers for correct operation, and any two consecutive > +instances of the same client should have the same identifier for optimal > +crash recovery. Also recovery from network partitions. > +.PP > +If two different clients present the same identity to a server there are > +two possible scenarios. If the clients use the same credential then the > +server will treat them as the same client which appears to be restarting > +frequently. One client may manage to open some files etc, but as soon > +as the other client does anything the first client will lose access and > +need to re-open everything. This seems fuzzy. 1. If locks are lost, then there is a substantial risk of data corruption. 2. Is the client itself supposed to re-open files, or are applications somehow notified that they need to re-open? Either of these scenarios is fraught -- I don't believe any application is coded to expect to have to re-open a file due to exigent circumstances. > +.PP > +If the clients use different credentials, then the second client to > +establish a connection to the server will be refused access. For > +.B auth=sys > +the credential is based on hostname, so will be the same if the > +identities are the same. With > +.B auth=krb > +the credential is stored in > +.I /etc/krb5.keytab > +and will be the same only if this is copied among hosts. This language implies that copying the keytab is a recommended thing to do. It's not. I mentioned it before because some customers think it's OK to use the same keytab across their client fleet. But obviously that will result in lost open and lock state. I suggest rephrasing this last sentence to describe the negative lease recovery consequence of two clients happening to share the same host principal -- as in "This is why you shouldn't share keytabs..." > +.PP > +If the identity is unique but not stable, for example if it is generated > +randomly on each start up of the NFS client, then crash recovery is > +affected. When a client shuts down uncleanly and restarts, the server > +will normally detect this because the same identity is presented with > +different boot time (or "incarnation verifier"), and will discard old > +state. If the client presents a different identifier, then the server > +cannot discard old state until the lease time has expired, and the new > +client may be delayed in opening or locking files that it was > +previously accessing. > .SH FILES > .TP 1.5i > .I /etc/fstab > -- > 2.35.1 > -- Chuck Lever
Thanks for the typo fixes Ben - I've applied them o my local copy. NeilBrown
On Tue, 15 Mar 2022, Chuck Lever III wrote: > Hi Neil- > > > On Mar 13, 2022, at 9:04 PM, NeilBrown <neilb@suse.de> wrote: > > > > > > When mounting NFS filesystem in a network namespace using v4, some care > > must be taken to ensure a unique and stable client identity. Similar > > case is needed for NFS-root and other situations. > > > > Add documentation explaining the requirements for the NFS identity in > > these situations. > > > > Signed-off-by: NeilBrown <neilb@suse.de> > > --- > > > > I think I've address most of the feedback, but please forgive and remind > > if I missed something. > > NeilBrown > > > > utils/mount/nfs.man | 109 +++++++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 108 insertions(+), 1 deletion(-) > > > > diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man > > index d9f34df36b42..5f15abe8cf72 100644 > > --- a/utils/mount/nfs.man > > +++ b/utils/mount/nfs.man > > @@ -1,7 +1,7 @@ > > .\"@(#)nfs.5" > > .TH NFS 5 "9 October 2012" > > .SH NAME > > -nfs \- fstab format and options for the > > +nfs \- fstab format and configuration for the > > .B nfs > > file systems > > Suggest "configuration for nfs file systems" (remove "the") Agreed. > > > > .SH SYNOPSIS > > @@ -1844,6 +1844,113 @@ export pathname, but not both, during a remount. For example, > > merges the mount option > > .B ro > > with the mount options already saved on disk for the NFS server mounted at /mnt. > > +.SH "NFS CLIENT IDENTIFIER" > > +NFSv4 requires that the client present a unique identifier to the server > > +to be used to track state such as file locks. By default Linux NFS uses > > +the host name, as configured at the time of the first NFS mount, > > +together with some fixed content such as the name "Linux NFS" and the > > +particular protocol version. When the hostname is guaranteed to be > > +unique among all client which access the same server this is sufficient. > > +If hostname uniqueness cannot be assumed, extra identity information > > +must be provided. > > The last sentence is made ambiguous by the use of passive voice. > > Suggest: "When hostname uniqueness cannot be guaranteed, the client > administrator must provide extra identity information." Why must the client administrator do this? Why can't some automated tool do this? Or some container-building environment. That's an advantage of the passive voice, you don't need to assign responsibility for the verb. > > I have a problem with basing our default uniqueness guarantee on > hostnames "most of the time" hoping it will all work out. There > are simply too many common cases where hostname stability can't be > relied upon. Our sustaining teams will happily tell us this hope > hasn't so far been born out. Maybe it has not been born out because there is no documented requirement for it that we can point people to. Clearly containers that use NFS are not currently all configured well to do this. Some change is needed. Maybe adding a unique host name is the easiest change ... or maybe not. Surely NFS is not the *only* service that uses the host name. Encouraging the use of unique host names might benefit others. The practical reality is that a great many NFS client installations do currently depend on unique host names - after all, it actually works. Is it really so unreasonable to try to encourage the exceptions to fit the common pattern better? > > I also don't feel that nfs(5) is an appropriate place for this level > of detail. Documentation/filesystems/nfs/ is more appropriate IMO. > In general, man pages are good for quick summaries, not for > explainers. Here, it reads like "you, a user, are going to have to > do this thing that is like filling out a tax form" -- in reality it > should be information that should be: > > - Ignorable by most folks > - Used by distributors to add value by automating set up > - Used for debugging large client installations nfs(5) contains sections on TRANSPORT METHODS, DATA AND METADATA COHERENCE, SECURITY CONSIDERATIONS. Is this section really out of place? I could agree that all of these sections belong in "section 7" (Overview, conventions, and miscellaneous) rather than "section 5" (File formats and configuration files) but we don't have nfs.7 (yet). I think section 7 is a reasonable fit for your 3 points above. I don't agree that Documentation/filesystems/nfs/ is sufficient. That is (from my perspective) primarily of interest to kernel developers. The whole point of this exercise that at we need to reach people outside of that group. > > Maybe I'm just stating this to understand the purpose of this > patch, but it could also be used as an "Intended audience" > disclaimer in this new section. OK, so the "purpose of this patch" relates in part to a comment you made earlier, which I include here: > Since it is just a line or two of code, it might be of little > harm just to go with separate implementations for now and stop > talking about it. If it sucks, we can fix the suckage. > > Who volunteers to implement this mechanism in mount.nfs ? I don't think this is the best next step. I think we need to get some container system developer to contribute here. So far we only have second hand anecdotes about problems. I think the most concrete is from Ben suggesting that in at least one container system, using /etc/machine-id is a good idea. I don't think we can change nfs-utils (whether mount.nfs or mount.conf or some other way) to set identity from /etc/machine-id for everyone. So we need at least for that container system to request that change. How would they like to do that? I suggest that we explain the problem to representatives of the various container communities that we have contact with (Well... "you", more than "we" as I don't have contacts). We could use the documentation I provided to clearly present the problem. Then ask: - would you like to just run some shell code (see examples) - or would you like to provide an /etc/nfs.conf.d/my-container.conf - or would you like to run a tool that we provide - or is there already a push to provide unique container hostnames, and is this the incentive you need to help that push across the line? If we have someone from $CONTAINER_COMMUNITY say "if you do this thing, then we will use it", then that would be hard to argue with. If we could get two or three different communities to comment, I expect the best answer would become a lot more obvious. But first we, ourselves, need to agree on the document :-) > > > > +.PP > > +Some situations which are known to be problematic with respect to unique > > +host names include: > > A little wordy. > > Suggest: "Situations known to be problematic with respect to unique > hostnames include:" Yep. > > If this will eventually become part of nfs(5), I would first run > this patch by documentation experts, because they might have a > preference for "hostnames" over "host names" and "namespaces" over > "name-spaces". Usage of these terms throughout this patch is not > consistent. I've made it consistently "hostname" and "namespace" which is consistent with the rest of the document > > > > +.IP \- 2 > > +NFS-root (diskless) clients, where the DCHP server (or equivalent) does > > +not provide a unique host name. > > Suggest this addition: > > .IP \- 2 > > Dynamically-assigned hostnames, where the hostname can be changed after > a client reboot, while the client is booted, or if a client often > repeatedly connects to multiple networks (for example if it is moved > from home to an office every day). This is a different kettle of fish. The hostname is *always* included in the identifier. If it isn't stable, then the identifier isn't stable. I saw in the history that when you introduced the module parameter it replaced the hostname. This caused problems in containers (which had different host names) so Trond changed it so the module parameter supplemented the hostname. If hostnames are really so poorly behaved I can see there might be a case to suppress the hostname, but we don't have that option is current kernels. Should we add it? > > > > +.IP \- 2 > > +"containers" within a single Linux host. If each container has a separate > > +network namespace, but does not use the UTS namespace to provide a unique > > +host name, then there can be multiple effective NFS clients with the > > +same host name. > > +.IP \= 2 > > .IP \- 2 Thanks. > > > > +Clients across multiple administrative domains that access a common NFS > > +server. If assignment of host name is devolved to separate domains, > > I don't recognize the phrase "assignment is devolved to separate domains". > Can you choose a friendlier way of saying this? > If hostnames are not assigned centrally then uniqueness cannot be guaranteed unless a domain name is included in the hostname. > > > +uniqueness cannot be guaranteed, unless a domain name is included in the > > +host name. > > +.SS "Increasing Client Uniqueness" > > +Apart from the host name, which is the preferred way to differentiate > > +NFS clients, there are two mechanisms to add uniqueness to the > > +client identifier. > > +.TP > > +.B nfs.nfs4_unique_id > > +This module parameter can be set to an arbitrary string at boot time, or > > +when the > > +.B nfs > > +module is loaded. This might be suitable for configuring diskless clients. > > Suggest: "This is suitable for" OK > > > > +.TP > > +.B /sys/fs/nfs/client/net/identifier > > +This virtual file (available since Linux 5.3) is local to the network > > +name-space in which it is accessed and so can provided uniqueness between > > +network namespaces (containers) when the hostname remains uniform. > > ^provided^provide > > ^between^amongst > > and the clause at the end confused me. > > Suggest: "in which it is accessed and thus can provide uniqueness > amongst network namespaces (containers)." The clause at the end was simply emphasising that the identifer is only needed if the hostname does not vary across containers. I have removed it. > > > > +.RS > > +.PP > > +This value is empty on name-space creation. > > +If the value is to be set, that should be done before the first > > +mount. If the container system has access to some sort of per-container > > +identity then that identity, possibly obfuscated as a UUID is privacy is > > +needed, can be used. Combining the identity with the name of the > > +container systems would also help. > > I object to recommending obfuscation via a UUID. > > 1. This is confusing because there has been no mention of any > persistence requirement so far. At this point, a reader > might think that the client can simply convert the hostname > and netns identifier every time it boots. However this is > only OK to do if these things are guaranteed not to change > during the lifetime of a client. In a world where a majority > of systems get their hostnames dynamically, I think this is > a shaky foundation. If the hostname changes after boot (weird concept .. does that really happen?) that is irrelevant. The hostname is copied at boot by NFS, and if it is included in the /sys/fs/nfs/client/identifier (which would be pointless, but not harmful) it has again been copied. If it is different on subsequent boots, then that is a big problem and not one that we can currently fix. ....except that non-persistent client identifiers isn't an enormous problem, just a possible cause of delays. > > 2. There's no requirement that this uniquifier be in the form > of a UUID anywhere in specifications, and the Linux client > itself does not add such a requirement. (You suggested > before that we should start by writing down requirements. > Using a UUID ain't a requirement). The requirement here is that /etc/machine-id is documented as requiring obfuscation. uuidgen is a convenient way to provide obfuscation. That is all I was trying to say. > > Linux chooses to implement its uniquifer with a UUID because > it is assumed we are using a random UUID (rather than a > name-based or time-based UUID). A random UUID has strong > global uniqueness guarantees, which guarantees the client > identifier will always be unique amongst clients in nearly > all situations for nearly no cost. > "Linux chooses" what does that mean? I've lost the thread here, sorry. > If we want to create a good uniquifier here, then combine the > hostname, netns identity, and/or the host's machine-id and then > hash that blob with a known strong digest algorithm like > SHA-256. A man page must not recommend the use of deprecated or > insecure obfuscation mechanisms. I didn't realize the hash that uuidgen uses was deprecated. Is there some better way to provide an app-specific obfuscation of a string from the command line? Maybe echo nfs-id:`cat /etc/machine-id`| sha256sum ?? > > The man page can suggest a random-based UUID as long as it > states plainly that such UUIDs have global uniqueness guarantees > that make them suitable for this purpose. We're using a UUID > for its global uniqueness properties, not because of its > appearance. So I could use "/etc/nfsv4-identity" instead of "/etc/nfs4-uuid". What else should I change/add. > > > > For example: > > +.RS 4 > > +echo "ip-netns:`ip netns identify`" \\ > > +.br > > + > /sys/fs/nfs/client/net/identifier > > +.br > > +uuidgen --sha1 --namespace @url \\ > > +.br > > + -N "nfs:`cat /etc/machine-id`" \\ > > +.br > > + > /sys/fs/nfs/client/net/identifier > > +.RE > > +If the container system provides no stable name, > > +but does have stable storage, > > Here's the first mention of "stable". It needs some > introduction far above. True. So the first para becomes: NFSv4 requires that the client present a stable unique identifier to the server to be used to track state such as file locks. By default Linux NFS uses the hostname, as configured at the time of the first NFS mount, together with some fixed content such as the name "Linux NFS" and the particular protocol version. When the hostname is guaranteed to be unique among all client which access the same server, and stable across reboots, this is sufficient. If hostname uniqueness cannot be assumed, extra identity information must be provided. If the hostname is not stable, unclean restarts may suffer unavoidable delays. > > > > then something like > > +.RS 4 > > +[ -s /etc/nfsv4-uuid ] || uuidgen > /etc/nfsv4-uuid && > > +.br > > +cat /etc/nfsv4-uuid > /sys/fs/nfs/client/net/identifier > > +.RE > > +would suffice. > > +.PP > > +If a container has neither a stable name nor stable (local) storage, > > +then it is not possible to provide a stable identifier, so providing > > +a random identifier to ensure uniqueness would be best > > +.RS 4 > > +uuidgen > /sys/fs/nfs/client/net/identifier > > +.RE > > +.RE > > +.SS Consequences of poor identity setting > > This section provides context to understand the above technical > recommendations. I suggest this whole section should be moved > to near the opening paragraph. I seem to keep moving things upwards.... something has to come last. Maybe a "(See below)" at the end of the revised first para? > > > > +Any two concurrent clients that might access the same server must have > > +different identifiers for correct operation, and any two consecutive > > +instances of the same client should have the same identifier for optimal > > +crash recovery. > > Also recovery from network partitions. A network partition doesn't coincide with two consecutive instances of the same client. There is just one client instance and one server instance. > > > > +.PP > > +If two different clients present the same identity to a server there are > > +two possible scenarios. If the clients use the same credential then the > > +server will treat them as the same client which appears to be restarting > > +frequently. One client may manage to open some files etc, but as soon > > +as the other client does anything the first client will lose access and > > +need to re-open everything. > > This seems fuzzy. > > 1. If locks are lost, then there is a substantial risk of data > corruption. > > 2. Is the client itself supposed to re-open files, or are > applications somehow notified that they need to re-open? > Either of these scenarios is fraught -- I don't believe any > application is coded to expect to have to re-open a file > due to exigent circumstances. I wasn't very happy with the description either. I think we want some detail, but not too much. The "re-opening" that I mentioned is the NFS client resubmitting NFS OPEN requests, not the application having to re-open. However if the application manages to get a lock, then when the "other" client connects to the server the application will lose the lock, and all read/write accesses on the relevant fd will result in EIO (I think). Clearly bad. I wanted to say the clients could end up "fighting" with each other - the EXCHANGE_ID from one destroys the state set up by the other - I that seems to be too much anthropomorphism. If two different clients present the same identity to a server there are two possible scenarios. If the clients use the same credential then the server will treat them as the same client which appears to be restarting frequently. The clients will each enter a loop where they establish state with the server and then find that the state has been destroy by the other client and so will need to establish it again. ??? > > > > +.PP > > +If the clients use different credentials, then the second client to > > +establish a connection to the server will be refused access. For > > +.B auth=sys > > +the credential is based on hostname, so will be the same if the > > +identities are the same. With > > +.B auth=krb > > +the credential is stored in > > +.I /etc/krb5.keytab > > +and will be the same only if this is copied among hosts. > > This language implies that copying the keytab is a recommended thing > to do. It's not. I mentioned it before because some customers think > it's OK to use the same keytab across their client fleet. But obviously > that will result in lost open and lock state. > > I suggest rephrasing this last sentence to describe the negative lease > recovery consequence of two clients happening to share the same host > principal -- as in "This is why you shouldn't share keytabs..." > How about .PP If the clients use different credentials, then the second client to establish a connection to the server will be refused access which is a safer failure mode. For .B auth=sys the credential is based on hostname, so will be the same if the identities are the same. With .B auth=krb the credential is stored in .I /etc/krb5.keytab so providing this isn't copied among client the safer failure mode will result. ?? Thanks for your details review! NeilBrown > > > +.PP > > +If the identity is unique but not stable, for example if it is generated > > +randomly on each start up of the NFS client, then crash recovery is > > +affected. When a client shuts down uncleanly and restarts, the server > > +will normally detect this because the same identity is presented with > > +different boot time (or "incarnation verifier"), and will discard old > > +state. If the client presents a different identifier, then the server > > +cannot discard old state until the lease time has expired, and the new > > +client may be delayed in opening or locking files that it was > > +previously accessing. > > .SH FILES > > .TP 1.5i > > .I /etc/fstab > > -- > > 2.35.1 > > > > -- > Chuck Lever > > > >
Howdy Neil- > On Mar 14, 2022, at 8:41 PM, NeilBrown <neilb@suse.de> wrote: > > On Tue, 15 Mar 2022, Chuck Lever III wrote: >> Hi Neil- >> >>> On Mar 13, 2022, at 9:04 PM, NeilBrown <neilb@suse.de> wrote: >>> >>> >>> When mounting NFS filesystem in a network namespace using v4, some care >>> must be taken to ensure a unique and stable client identity. Similar >>> case is needed for NFS-root and other situations. >>> >>> Add documentation explaining the requirements for the NFS identity in >>> these situations. >>> >>> Signed-off-by: NeilBrown <neilb@suse.de> >>> --- >>> >>> I think I've address most of the feedback, but please forgive and remind >>> if I missed something. >>> NeilBrown >>> >>> utils/mount/nfs.man | 109 +++++++++++++++++++++++++++++++++++++++++++- >>> 1 file changed, 108 insertions(+), 1 deletion(-) >>> >>> diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man >>> index d9f34df36b42..5f15abe8cf72 100644 >>> --- a/utils/mount/nfs.man >>> +++ b/utils/mount/nfs.man >>> @@ -1,7 +1,7 @@ >>> .\"@(#)nfs.5" >>> .TH NFS 5 "9 October 2012" >>> .SH NAME >>> -nfs \- fstab format and options for the >>> +nfs \- fstab format and configuration for the >>> .B nfs >>> file systems >> >> Suggest "configuration for nfs file systems" (remove "the") > > Agreed. > >> >> >>> .SH SYNOPSIS >>> @@ -1844,6 +1844,113 @@ export pathname, but not both, during a remount. For example, >>> merges the mount option >>> .B ro >>> with the mount options already saved on disk for the NFS server mounted at /mnt. >>> +.SH "NFS CLIENT IDENTIFIER" >>> +NFSv4 requires that the client present a unique identifier to the server >>> +to be used to track state such as file locks. By default Linux NFS uses >>> +the host name, as configured at the time of the first NFS mount, >>> +together with some fixed content such as the name "Linux NFS" and the >>> +particular protocol version. When the hostname is guaranteed to be >>> +unique among all client which access the same server this is sufficient. >>> +If hostname uniqueness cannot be assumed, extra identity information >>> +must be provided. >> >> The last sentence is made ambiguous by the use of passive voice. >> >> Suggest: "When hostname uniqueness cannot be guaranteed, the client >> administrator must provide extra identity information." > > Why must the client administrator do this? Why can't some automated > tool do this? Or some container-building environment. > That's an advantage of the passive voice, you don't need to assign > responsibility for the verb. My point is that in order to provide the needed information, elevated privilege is required. The current sentence reads as if J. Random User could be interrupted at some point and asked for help. In other words, the documentation should state that this is an administrative task. Here I'm not advocating for a specific mechanism to actually perform that task. >> I have a problem with basing our default uniqueness guarantee on >> hostnames "most of the time" hoping it will all work out. There >> are simply too many common cases where hostname stability can't be >> relied upon. Our sustaining teams will happily tell us this hope >> hasn't so far been born out. > > Maybe it has not been born out because there is no documented > requirement for it that we can point people to. > Clearly containers that use NFS are not currently all configured well to do > this. Some change is needed. Maybe adding a unique host name is the > easiest change ... or maybe not. You seem to be documenting the client's current behavior. The tone of the documentation is that this behavior is fine and works for most people. It's the second part that I disagree with. Oracle Linux has bugs documenting this behavior is a problem, and I'm sure Red Hat does too. The current behavior is broken. It is this brokeness that we are trying to resolve. So let me make a stronger statement: we should not document that broken behavior in nfs(5). Instead, we should fix that behavior, and then document the golden brown and delicious behavior. Updating nfs(5) first is putting DeCarte in front of de horse. > Surely NFS is not the *only* service that uses the host name. > Encouraging the use of unique host names might benefit others. Unless you have specific use cases that might benefit from ensuring hostname uniqueness, I would beg that you stay focused on the immediate issue of how the Linux client constructs its nfs_client_id4 strings. > The practical reality is that a great many NFS client installations do > currently depend on unique host names - after all, it actually works. > Is it really so unreasonable to try to encourage the exceptions to fit > the common pattern better? Yes it is unreasonable. NFS servers typically have a fixed DNS presence. They have to because clients mount by hostname. NFS clients, on the other hand, are not under that constraint. The only time I can think of where a client has to have a fixed hostname is if a krb5 host principal is involved. In so many other cases, eg. mobile computing or elastic services, the client hostname is mutable. I don't think it's fair to put another constraint on host naming here, especially one with implications of service denial or data corruption (see below). >> Maybe I'm just stating this to understand the purpose of this >> patch, but it could also be used as an "Intended audience" >> disclaimer in this new section. > > OK, so the "purpose of this patch" relates in part to a comment you made > earlier, which I include here: > >> Since it is just a line or two of code, it might be of little >> harm just to go with separate implementations for now and stop >> talking about it. If it sucks, we can fix the suckage. >> >> Who volunteers to implement this mechanism in mount.nfs ? > > I don't think this is the best next step. I think we need to get some > container system developer to contribute here. So far we only have > second hand anecdotes about problems. I think the most concrete is from > Ben suggesting that in at least one container system, using > /etc/machine-id is a good idea. > > I don't think we can change nfs-utils (whether mount.nfs or mount.conf > or some other way) to set identity from /etc/machine-id for everyone. > So we need at least for that container system to request that change. > > How would they like to do that? > > I suggest that we explain the problem to representatives of the various > container communities that we have contact with (Well... "you", more > than "we" as I don't have contacts). I'm all for involving one or more container experts. But IMO it's not appropriate to update our man page to do that. Let's update nfs(5) when we are done with this effort. > We could use the documentation I provided to clearly present the > problem. No doubt, we need a crisp problem statement! > Then ask: > - would you like to just run some shell code (see examples) > - or would you like to provide an /etc/nfs.conf.d/my-container.conf > - or would you like to run a tool that we provide > - or is there already a push to provide unique container hostnames, > and is this the incentive you need to help that push across the > line? > > If we have someone from $CONTAINER_COMMUNITY say "if you do this thing, > then we will use it", then that would be hard to argue with. > If we could get two or three different communities to comment, I expect > the best answer would become a lot more obvious. > > But first we, ourselves, need to agree on the document :-) If the community is seeking help, then a wiki might be a better place to formulate a problem statement. >>> +.PP >>> +Some situations which are known to be problematic with respect to unique >>> +host names include: >> >> A little wordy. >> >> Suggest: "Situations known to be problematic with respect to unique >> hostnames include:" > > Yep. > >> >> If this will eventually become part of nfs(5), I would first run >> this patch by documentation experts, because they might have a >> preference for "hostnames" over "host names" and "namespaces" over >> "name-spaces". Usage of these terms throughout this patch is not >> consistent. > > I've made it consistently "hostname" and "namespace" which is consistent > with the rest of the document > >> >> >>> +.IP \- 2 >>> +NFS-root (diskless) clients, where the DCHP server (or equivalent) does >>> +not provide a unique host name. >> >> Suggest this addition: >> >> .IP \- 2 >> >> Dynamically-assigned hostnames, where the hostname can be changed after >> a client reboot, while the client is booted, or if a client often >> repeatedly connects to multiple networks (for example if it is moved >> from home to an office every day). > > This is a different kettle of fish. The hostname is *always* included > in the identifier. If it isn't stable, then the identifier isn't > stable. > > I saw in the history that when you introduced the module parameter it > replaced the hostname. This caused problems in containers (which had > different host names) so Trond changed it so the module parameter > supplemented the hostname. > > If hostnames are really so poorly behaved I can see there might be a > case to suppress the hostname, but we don't have that option is current > kernels. Should we add it? I claim that it has become problematic to use the hostname in the nfs_client_id4 string. 25 years ago when NFSv4.0 was being crafted, it was assumed that client hostnames were unchanging. The original RFC 3010 recommended adding the hostname, the client IP address, and the server IP address to the nfs_client_id4 string. Since then, we've learned that the IP addresses are quite mutable, and thus not appropriate for a fixed identifier. I argue that the client's hostname is now the same. The Linux NFSv4 prototype and subsequent production code used the local hostname because it's easy to access in the kernel via the UTS name. That was adequate 20 years ago, but has become less so over time. You can view this evolution in the commit log. It doesn't seem that complicated (to me) to divorce the client_id4 string from the local hostname, and the benefits are significant. >>> +.IP \- 2 >>> +"containers" within a single Linux host. If each container has a separate >>> +network namespace, but does not use the UTS namespace to provide a unique >>> +host name, then there can be multiple effective NFS clients with the >>> +same host name. >>> +.IP \= 2 >> >> .IP \- 2 > > Thanks. > >> >> >>> +Clients across multiple administrative domains that access a common NFS >>> +server. If assignment of host name is devolved to separate domains, >> >> I don't recognize the phrase "assignment is devolved to separate domains". >> Can you choose a friendlier way of saying this? >> > > If hostnames are not assigned centrally then uniqueness cannot be > guaranteed unless a domain name is included in the hostname. Better, thanks. >>> +.RS >>> +.PP >>> +This value is empty on name-space creation. >>> +If the value is to be set, that should be done before the first >>> +mount. If the container system has access to some sort of per-container >>> +identity then that identity, possibly obfuscated as a UUID is privacy is >>> +needed, can be used. Combining the identity with the name of the >>> +container systems would also help. >> >> I object to recommending obfuscation via a UUID. >> >> 1. This is confusing because there has been no mention of any >> persistence requirement so far. At this point, a reader >> might think that the client can simply convert the hostname >> and netns identifier every time it boots. However this is >> only OK to do if these things are guaranteed not to change >> during the lifetime of a client. In a world where a majority >> of systems get their hostnames dynamically, I think this is >> a shaky foundation. > > If the hostname changes after boot (weird concept .. does that really > happen?) that is irrelevant. It really happens. A DHCP lease renewal can do it. Moving to a new subnet on the same campus might do it. I can open "Device Settings" on my laptop and change my laptop's hostname on a whim. Joining a VPN might do it. A client might have multiple network interfaces, each with a unique hostname. Which one should be used for the nfs_client_id4 string? RFCs 7931 and 8587 discuss how trunking needs to work: the upshot is that the client needs to have one consistent nfs_client_id4 string it presents to all servers (in case of migration) no matter which network path it uses to access the server. > The hostname is copied at boot by NFS, and > if it is included in the /sys/fs/nfs/client/identifier (which would be > pointless, but not harmful) it has again been copied. > > If it is different on subsequent boots, then that is a big problem and > not one that we can currently fix. Yes, we can fix it: don't use the client's hostname but instead use a separate persistent uniquifier, as has been proposed. > ....except that non-persistent client identifiers isn't an enormous > problem, just a possible cause of delays. I disagree, it's a significant issue. - If locks are lost, that is a potential source of data corruption. - If a lease is stolen, that is a denial of service. Our customers take this very seriously. The NFS clients's out-of-the-shrink-wrap default behavior/configuration should be conservative enough to prevent these issues. Customers store mission critical data via NFS. Most customers expect NFS to work reliably without a lot of configuration fuss. >> 2. There's no requirement that this uniquifier be in the form >> of a UUID anywhere in specifications, and the Linux client >> itself does not add such a requirement. (You suggested >> before that we should start by writing down requirements. >> Using a UUID ain't a requirement). > > The requirement here is that /etc/machine-id is documented as requiring > obfuscation. uuidgen is a convenient way to provide obfuscation. That > is all I was trying to say. Understood, but the words you used have some additional implications that you might not want. >> Linux chooses to implement its uniquifer with a UUID because >> it is assumed we are using a random UUID (rather than a >> name-based or time-based UUID). A random UUID has strong >> global uniqueness guarantees, which guarantees the client >> identifier will always be unique amongst clients in nearly >> all situations for nearly no cost. >> > > "Linux chooses" what does that mean? I've lost the thread here, sorry. Try instead: "The documentation regarding the nfs_unique_id module parameter suggests the use of a UUID because..." >> If we want to create a good uniquifier here, then combine the >> hostname, netns identity, and/or the host's machine-id and then >> hash that blob with a known strong digest algorithm like >> SHA-256. A man page must not recommend the use of deprecated or >> insecure obfuscation mechanisms. > > I didn't realize the hash that uuidgen uses was deprecated. Is there > some better way to provide an app-specific obfuscation of a string from > the command line? > > Maybe > echo nfs-id:`cat /etc/machine-id`| sha256sum > > ?? Something like that, yes. But the scriptlet needs to also involve the netns identity somehow. >> The man page can suggest a random-based UUID as long as it >> states plainly that such UUIDs have global uniqueness guarantees >> that make them suitable for this purpose. We're using a UUID >> for its global uniqueness properties, not because of its >> appearance. > > So I could use "/etc/nfsv4-identity" instead of "/etc/nfs4-uuid". I like. I would prefer not using "uuid" in the name. Ben and Steve were resistant to that idea, though. > What else should I change/add. > >> >> >>> For example: >>> +.RS 4 >>> +echo "ip-netns:`ip netns identify`" \\ >>> +.br >>> + > /sys/fs/nfs/client/net/identifier >>> +.br >>> +uuidgen --sha1 --namespace @url \\ >>> +.br >>> + -N "nfs:`cat /etc/machine-id`" \\ >>> +.br >>> + > /sys/fs/nfs/client/net/identifier >>> +.RE >>> +If the container system provides no stable name, >>> +but does have stable storage, >> >> Here's the first mention of "stable". It needs some >> introduction far above. > > True. So the first para becomes: > > NFSv4 requires that the client present a stable unique identifier to > the server to be used to track state such as file locks. By default > Linux NFS uses the hostname, as configured at the time of the first > NFS mount, together with some fixed content such as the name "Linux > NFS" and the particular protocol version. When the hostname is > guaranteed to be unique among all client which access the same server, > and stable across reboots, this is sufficient. If hostname uniqueness > cannot be assumed, extra identity information must be provided. If > the hostname is not stable, unclean restarts may suffer unavoidable > delays. See above. The impact is more extensive than "unavoidable delays." >>> then something like >>> +.RS 4 >>> +[ -s /etc/nfsv4-uuid ] || uuidgen > /etc/nfsv4-uuid && >>> +.br >>> +cat /etc/nfsv4-uuid > /sys/fs/nfs/client/net/identifier >>> +.RE >>> +would suffice. >>> +.PP >>> +If a container has neither a stable name nor stable (local) storage, >>> +then it is not possible to provide a stable identifier, so providing >>> +a random identifier to ensure uniqueness would be best >>> +.RS 4 >>> +uuidgen > /sys/fs/nfs/client/net/identifier >>> +.RE >>> +.RE >>> +.SS Consequences of poor identity setting >> >> This section provides context to understand the above technical >> recommendations. I suggest this whole section should be moved >> to near the opening paragraph. > > I seem to keep moving things upwards.... something has to come last. > Maybe a "(See below)" at the end of the revised first para? > >> >> >>> +Any two concurrent clients that might access the same server must have >>> +different identifiers for correct operation, and any two consecutive >>> +instances of the same client should have the same identifier for optimal >>> +crash recovery. >> >> Also recovery from network partitions. > > A network partition doesn't coincide with two consecutive instances of the > same client. There is just one client instance and one server instance. It's possible for one of the peers to reboot during the network partition. >>> +.PP >>> +If two different clients present the same identity to a server there are >>> +two possible scenarios. If the clients use the same credential then the >>> +server will treat them as the same client which appears to be restarting >>> +frequently. One client may manage to open some files etc, but as soon >>> +as the other client does anything the first client will lose access and >>> +need to re-open everything. >> >> This seems fuzzy. >> >> 1. If locks are lost, then there is a substantial risk of data >> corruption. >> >> 2. Is the client itself supposed to re-open files, or are >> applications somehow notified that they need to re-open? >> Either of these scenarios is fraught -- I don't believe any >> application is coded to expect to have to re-open a file >> due to exigent circumstances. > > I wasn't very happy with the description either. I think we want some > detail, but not too much. > > The "re-opening" that I mentioned is the NFS client resubmitting NFS > OPEN requests, not the application having to re-open. > However if the application manages to get a lock, then when the "other" > client connects to the server the application will lose the lock, and > all read/write accesses on the relevant fd will result in EIO (I > think). Clearly bad. > > I wanted to say the clients could end up "fighting" with each other - > the EXCHANGE_ID from one destroys the state set up by the other - I that > seems to be too much anthropomorphism. > > If two different clients present the same identity to a server there > are two possible scenarios. If the clients use the same credential > then the server will treat them as the same client which appears to > be restarting frequently. The clients will each enter a loop where > they establish state with the server and then find that the state > has been destroy by the other client and so will need to establish > it again. > > ??? My colleague Calum coined the term "lease stealing". That might be a good thing to define somewhere and simply use that term as needed. >>> +.PP >>> +If the clients use different credentials, then the second client to >>> +establish a connection to the server will be refused access. For >>> +.B auth=sys >>> +the credential is based on hostname, so will be the same if the >>> +identities are the same. With >>> +.B auth=krb >>> +the credential is stored in >>> +.I /etc/krb5.keytab >>> +and will be the same only if this is copied among hosts. >> >> This language implies that copying the keytab is a recommended thing >> to do. It's not. I mentioned it before because some customers think >> it's OK to use the same keytab across their client fleet. But obviously >> that will result in lost open and lock state. >> >> I suggest rephrasing this last sentence to describe the negative lease >> recovery consequence of two clients happening to share the same host >> principal -- as in "This is why you shouldn't share keytabs..." >> > > How about > > .PP > If the clients use different credentials, then the second client to > establish a connection to the server will be refused access which is a > safer failure mode. For > .B auth=sys > the credential is based on hostname, so will be the same if the > identities are the same. With > .B auth=krb > the credential is stored in > .I /etc/krb5.keytab > so providing this isn't copied among client the safer failure mode will result. With .BR auth=krb5 , the client uses the host principal in .I /etc/krb5.keytab or in some cases, the lone user principal, to authenticate lease management operations. This securely prevents lease stealing. > ?? > > Thanks for your details review! > > NeilBrown > >> >>> +.PP >>> +If the identity is unique but not stable, for example if it is generated >>> +randomly on each start up of the NFS client, then crash recovery is >>> +affected. When a client shuts down uncleanly and restarts, the server >>> +will normally detect this because the same identity is presented with >>> +different boot time (or "incarnation verifier"), and will discard old >>> +state. If the client presents a different identifier, then the server >>> +cannot discard old state until the lease time has expired, and the new >>> +client may be delayed in opening or locking files that it was >>> +previously accessing. >>> .SH FILES >>> .TP 1.5i >>> .I /etc/fstab >>> -- >>> 2.35.1 >>> >> >> -- >> Chuck Lever -- Chuck Lever
On Thu, 17 Mar 2022, Chuck Lever III wrote: > Howdy Neil- G'day > >> The last sentence is made ambiguous by the use of passive voice. > >> > >> Suggest: "When hostname uniqueness cannot be guaranteed, the client > >> administrator must provide extra identity information." > > > > Why must the client administrator do this? Why can't some automated > > tool do this? Or some container-building environment. > > That's an advantage of the passive voice, you don't need to assign > > responsibility for the verb. > > My point is that in order to provide the needed information, > elevated privilege is required. The current sentence reads as > if J. Random User could be interrupted at some point and asked > for help. > > In other words, the documentation should state that this is > an administrative task. Here I'm not advocating for a specific > mechanism to actually perform that task. ??? This whole man page is primarily about mount options, particularly as they appear in /etc/fstab. These are not available to the non-admin. Why would anyone think this section is any different? > > > >> I have a problem with basing our default uniqueness guarantee on > >> hostnames "most of the time" hoping it will all work out. There > >> are simply too many common cases where hostname stability can't be > >> relied upon. Our sustaining teams will happily tell us this hope > >> hasn't so far been born out. > > > > Maybe it has not been born out because there is no documented > > requirement for it that we can point people to. > > Clearly containers that use NFS are not currently all configured well to do > > this. Some change is needed. Maybe adding a unique host name is the > > easiest change ... or maybe not. > > You seem to be documenting the client's current behavior. > The tone of the documentation is that this behavior is fine > and works for most people. It certainly works for a lot of people. Many people are using NFSv4 quite effectively. I'm sure there are people who are having problems too, but let's not fall for the squeaky wheel fallacy. > > It's the second part that I disagree with. Oracle Linux has > bugs documenting this behavior is a problem, and I'm sure > Red Hat does too. The current behavior is broken. It is this > brokeness that we are trying to resolve. The current behaviour of NFS is NOT broken. Maybe is it not adequately robust against certain configuration choices. Certainly we should make it as robust as we reasonably can. But let's not overstate the problem. > > So let me make a stronger statement: we should not > document that broken behavior in nfs(5). Instead, we should > fix that behavior, and then document the golden brown and > delicious behavior. Updating nfs(5) first is putting > DeCarte in front of de horse. > > > > Surely NFS is not the *only* service that uses the host name. > > Encouraging the use of unique host names might benefit others. > > Unless you have specific use cases that might benefit from > ensuring hostname uniqueness, I would beg that you stay > focused on the immediate issue of how the Linux client > constructs its nfs_client_id4 strings. > > > > The practical reality is that a great many NFS client installations do > > currently depend on unique host names - after all, it actually works. > > Is it really so unreasonable to try to encourage the exceptions to fit > > the common pattern better? > > Yes it is unreasonable. > > NFS servers typically have a fixed DNS presence. They have > to because clients mount by hostname. > > NFS clients, on the other hand, are not under that constraint. > The only time I can think of where a client has to have a > fixed hostname is if a krb5 host principal is involved. > > In so many other cases, eg. mobile computing or elastic > services, the client hostname is mutable. I don't think > it's fair to put another constraint on host naming here, > especially one with implications of service denial or > data corruption (see below). > > > >> Maybe I'm just stating this to understand the purpose of this > >> patch, but it could also be used as an "Intended audience" > >> disclaimer in this new section. > > > > OK, so the "purpose of this patch" relates in part to a comment you made > > earlier, which I include here: > > > >> Since it is just a line or two of code, it might be of little > >> harm just to go with separate implementations for now and stop > >> talking about it. If it sucks, we can fix the suckage. > >> > >> Who volunteers to implement this mechanism in mount.nfs ? > > > > I don't think this is the best next step. I think we need to get some > > container system developer to contribute here. So far we only have > > second hand anecdotes about problems. I think the most concrete is from > > Ben suggesting that in at least one container system, using > > /etc/machine-id is a good idea. > > > > I don't think we can change nfs-utils (whether mount.nfs or mount.conf > > or some other way) to set identity from /etc/machine-id for everyone. > > So we need at least for that container system to request that change. > > > > How would they like to do that? > > > > I suggest that we explain the problem to representatives of the various > > container communities that we have contact with (Well... "you", more > > than "we" as I don't have contacts). > > I'm all for involving one or more container experts. But IMO > it's not appropriate to update our man page to do that. Let's > update nfs(5) when we are done with this effort. Don't let perfect be the enemy of good. We were making no progress with "fixing" nfs. Documenting "how it works today" should never be a bad thing. Obviously we can (and must) update the documentation when we update the behaviour. But if some concrete behavioural changes can be agreed and implemented through this discussion, I'm happy for the documentation to land only after those changes. > > > > We could use the documentation I provided to clearly present the > > problem. > > No doubt, we need a crisp problem statement! > > > > Then ask: > > - would you like to just run some shell code (see examples) > > - or would you like to provide an /etc/nfs.conf.d/my-container.conf > > - or would you like to run a tool that we provide > > - or is there already a push to provide unique container hostnames, > > and is this the incentive you need to help that push across the > > line? > > > > If we have someone from $CONTAINER_COMMUNITY say "if you do this thing, > > then we will use it", then that would be hard to argue with. > > If we could get two or three different communities to comment, I expect > > the best answer would become a lot more obvious. > > > > But first we, ourselves, need to agree on the document :-) > > If the community is seeking help, then a wiki might be a better > place to formulate a problem statement. > > > >>> +.PP > >>> +Some situations which are known to be problematic with respect to unique > >>> +host names include: > >> > >> A little wordy. > >> > >> Suggest: "Situations known to be problematic with respect to unique > >> hostnames include:" > > > > Yep. > > > >> > >> If this will eventually become part of nfs(5), I would first run > >> this patch by documentation experts, because they might have a > >> preference for "hostnames" over "host names" and "namespaces" over > >> "name-spaces". Usage of these terms throughout this patch is not > >> consistent. > > > > I've made it consistently "hostname" and "namespace" which is consistent > > with the rest of the document > > > >> > >> > >>> +.IP \- 2 > >>> +NFS-root (diskless) clients, where the DCHP server (or equivalent) does > >>> +not provide a unique host name. > >> > >> Suggest this addition: > >> > >> .IP \- 2 > >> > >> Dynamically-assigned hostnames, where the hostname can be changed after > >> a client reboot, while the client is booted, or if a client often > >> repeatedly connects to multiple networks (for example if it is moved > >> from home to an office every day). > > > > This is a different kettle of fish. The hostname is *always* included > > in the identifier. If it isn't stable, then the identifier isn't > > stable. > > > > I saw in the history that when you introduced the module parameter it > > replaced the hostname. This caused problems in containers (which had > > different host names) so Trond changed it so the module parameter > > supplemented the hostname. > > > > If hostnames are really so poorly behaved I can see there might be a > > case to suppress the hostname, but we don't have that option is current > > kernels. Should we add it? > > I claim that it has become problematic to use the hostname in the > nfs_client_id4 string. In that case, we should fix it - make it possible to exclude the hostname from the nfs_client_id4 string. You make a convincing case. Have you thoughts on how we should implement that? Add a new bool sysfs attribute: identity_includes_hostname which defaults to true and the current behaviour, but which can be set to false? Or should it transparently be set to false when the "identity" is set? > > 25 years ago when NFSv4.0 was being crafted, it was assumed that > client hostnames were unchanging. The original RFC 3010 recommended > adding the hostname, the client IP address, and the server IP > address to the nfs_client_id4 string. > > Since then, we've learned that the IP addresses are quite mutable, > and thus not appropriate for a fixed identifier. I argue that the > client's hostname is now the same. > > The Linux NFSv4 prototype and subsequent production code used the > local hostname because it's easy to access in the kernel via the > UTS name. That was adequate 20 years ago, but has become less so > over time. You can view this evolution in the commit log. > > It doesn't seem that complicated (to me) to divorce the client_id4 > string from the local hostname, and the benefits are significant. > > > >>> +.IP \- 2 > >>> +"containers" within a single Linux host. If each container has a separate > >>> +network namespace, but does not use the UTS namespace to provide a unique > >>> +host name, then there can be multiple effective NFS clients with the > >>> +same host name. > >>> +.IP \= 2 > >> > >> .IP \- 2 > > > > Thanks. > > > >> > >> > >>> +Clients across multiple administrative domains that access a common NFS > >>> +server. If assignment of host name is devolved to separate domains, > >> > >> I don't recognize the phrase "assignment is devolved to separate domains". > >> Can you choose a friendlier way of saying this? > >> > > > > If hostnames are not assigned centrally then uniqueness cannot be > > guaranteed unless a domain name is included in the hostname. > > Better, thanks. > > > >>> +.RS > >>> +.PP > >>> +This value is empty on name-space creation. > >>> +If the value is to be set, that should be done before the first > >>> +mount. If the container system has access to some sort of per-container > >>> +identity then that identity, possibly obfuscated as a UUID is privacy is > >>> +needed, can be used. Combining the identity with the name of the > >>> +container systems would also help. > >> > >> I object to recommending obfuscation via a UUID. > >> > >> 1. This is confusing because there has been no mention of any > >> persistence requirement so far. At this point, a reader > >> might think that the client can simply convert the hostname > >> and netns identifier every time it boots. However this is > >> only OK to do if these things are guaranteed not to change > >> during the lifetime of a client. In a world where a majority > >> of systems get their hostnames dynamically, I think this is > >> a shaky foundation. > > > > If the hostname changes after boot (weird concept .. does that really > > happen?) that is irrelevant. > > It really happens. A DHCP lease renewal can do it. Moving to a > new subnet on the same campus might do it. I can open "Device > Settings" on my laptop and change my laptop's hostname on a > whim. Joining a VPN might do it. > > A client might have multiple network interfaces, each with a > unique hostname. Which one should be used for the nfs_client_id4 > string? RFCs 7931 and 8587 discuss how trunking needs to work: > the upshot is that the client needs to have one consistent > nfs_client_id4 string it presents to all servers (in case of > migration) no matter which network path it uses to access the > server. > > > > The hostname is copied at boot by NFS, and > > if it is included in the /sys/fs/nfs/client/identifier (which would be > > pointless, but not harmful) it has again been copied. > > > > If it is different on subsequent boots, then that is a big problem and > > not one that we can currently fix. > > Yes, we can fix it: don't use the client's hostname but > instead use a separate persistent uniquifier, as has been > proposed. > > > > ....except that non-persistent client identifiers isn't an enormous > > problem, just a possible cause of delays. > > I disagree, it's a significant issue. > > - If locks are lost, that is a potential source of data corruption. > > - If a lease is stolen, that is a denial of service. > > Our customers take this very seriously. Of course, as they should. data integrity is paramount. non-persistent client identifier doesn't put that as risk - not in and of itself. If a client's identifier changed during the lifetime of one instance of the client, then that would allow locks to be lost. That does NOT happen just because you happen to change the host name. The hostname is copied at first use. It *could* happen if you changed the module parameter or sysfs identity after the first mount, but I hope we can agree that not a justifiable action. A lease can only be "stolen" by a non-unique identifier, not simply by non-persistent identifiers. But maybe this needs a caveat. If a set of clients are each given host names from time to time which are, at any moment in time, unique, but are able to "migrate" from one client to another, then it would be possible for two clients to both have performed their first NFS mount when they have some common hosttname X. The "first" was given host X at boot time, it mounted something. The hostname was subsequently change to Y and some other host booted and got X and then mounted from the same server. This would be seriously problematic. I class this as "non-unique" hostnames, not as non-persistent-identifier. > The NFS clients's > out-of-the-shrink-wrap default behavior/configuration should be > conservative enough to prevent these issues. Customers store > mission critical data via NFS. Most customers expect NFS to work > reliably without a lot of configuration fuss. I've been working on the assumption that it is not possible to provide ideal zero-config behaviour "out-of-the-shrink-wrap". You have hinted (or more) a few times that this is your goal. Certainly a worthy goal if possible. Is it possible? I contend that if there is no common standard for how containers (and network namespaces in particular) are used, then it is simply not possible to provide perfect out-of-the-box behaviour. There *must* be some local configuration that we cannot enforce through the kernel or through nfs-utils. We can offer, but we cannot enforce. So we must document. The very best that we could do would be to provide a random component to the identifier unless we had a high level of confidence that a unique identifier had been provided some other way. I don't know how to get that high level of confidence in a way that doesn't break working configurations. Ben suggested defaulting 'identity' to a random string for any network namespace other than init. I don't think that is cautious enough. Maybe if we did it when the network namespace is not init, but the UTS namepsace is init. But that feels like a hack and is probably brittle. Can you suggest *any* way to improve the "out-of-shrink-wrap" behaviour significantly? > > > >> 2. There's no requirement that this uniquifier be in the form > >> of a UUID anywhere in specifications, and the Linux client > >> itself does not add such a requirement. (You suggested > >> before that we should start by writing down requirements. > >> Using a UUID ain't a requirement). > > > > The requirement here is that /etc/machine-id is documented as requiring > > obfuscation. uuidgen is a convenient way to provide obfuscation. That > > is all I was trying to say. > > Understood, but the words you used have some additional > implications that you might not want. > > > >> Linux chooses to implement its uniquifer with a UUID because > >> it is assumed we are using a random UUID (rather than a > >> name-based or time-based UUID). A random UUID has strong > >> global uniqueness guarantees, which guarantees the client > >> identifier will always be unique amongst clients in nearly > >> all situations for nearly no cost. > >> > > > > "Linux chooses" what does that mean? I've lost the thread here, sorry. > > Try instead: "The documentation regarding the nfs_unique_id > module parameter suggests the use of a UUID because..." Ahhhh... that makes sense now - thanks. That documentation needs to be updated. It still says "used instead of a system's node name" while the code currently implements "used together with ..." > > > >> If we want to create a good uniquifier here, then combine the > >> hostname, netns identity, and/or the host's machine-id and then > >> hash that blob with a known strong digest algorithm like > >> SHA-256. A man page must not recommend the use of deprecated or > >> insecure obfuscation mechanisms. > > > > I didn't realize the hash that uuidgen uses was deprecated. Is there > > some better way to provide an app-specific obfuscation of a string from > > the command line? > > > > Maybe > > echo nfs-id:`cat /etc/machine-id`| sha256sum > > > > ?? > > Something like that, yes. But the scriptlet needs to also > involve the netns identity somehow. Hmmm.. the impression I got from Ben was that the container system ensured that /etc/machine-id was different in different containers. So there would be no need to add anything. Of course I should make that explicit in the documentation. I would be nice if we could always use "ip netns identify", but that doesn't seem to be generally supported. > > > >> The man page can suggest a random-based UUID as long as it > >> states plainly that such UUIDs have global uniqueness guarantees > >> that make them suitable for this purpose. We're using a UUID > >> for its global uniqueness properties, not because of its > >> appearance. > > > > So I could use "/etc/nfsv4-identity" instead of "/etc/nfs4-uuid". > > I like. I would prefer not using "uuid" in the name. Ben and > Steve were resistant to that idea, though. > > > > What else should I change/add. > > > >> > >> > >>> For example: > >>> +.RS 4 > >>> +echo "ip-netns:`ip netns identify`" \\ > >>> +.br > >>> + > /sys/fs/nfs/client/net/identifier > >>> +.br > >>> +uuidgen --sha1 --namespace @url \\ > >>> +.br > >>> + -N "nfs:`cat /etc/machine-id`" \\ > >>> +.br > >>> + > /sys/fs/nfs/client/net/identifier > >>> +.RE > >>> +If the container system provides no stable name, > >>> +but does have stable storage, > >> > >> Here's the first mention of "stable". It needs some > >> introduction far above. > > > > True. So the first para becomes: > > > > NFSv4 requires that the client present a stable unique identifier to > > the server to be used to track state such as file locks. By default > > Linux NFS uses the hostname, as configured at the time of the first > > NFS mount, together with some fixed content such as the name "Linux > > NFS" and the particular protocol version. When the hostname is > > guaranteed to be unique among all client which access the same server, > > and stable across reboots, this is sufficient. If hostname uniqueness > > cannot be assumed, extra identity information must be provided. If > > the hostname is not stable, unclean restarts may suffer unavoidable > > delays. > > See above. The impact is more extensive than "unavoidable delays." > > > >>> then something like > >>> +.RS 4 > >>> +[ -s /etc/nfsv4-uuid ] || uuidgen > /etc/nfsv4-uuid && > >>> +.br > >>> +cat /etc/nfsv4-uuid > /sys/fs/nfs/client/net/identifier > >>> +.RE > >>> +would suffice. > >>> +.PP > >>> +If a container has neither a stable name nor stable (local) storage, > >>> +then it is not possible to provide a stable identifier, so providing > >>> +a random identifier to ensure uniqueness would be best > >>> +.RS 4 > >>> +uuidgen > /sys/fs/nfs/client/net/identifier > >>> +.RE > >>> +.RE > >>> +.SS Consequences of poor identity setting > >> > >> This section provides context to understand the above technical > >> recommendations. I suggest this whole section should be moved > >> to near the opening paragraph. > > > > I seem to keep moving things upwards.... something has to come last. > > Maybe a "(See below)" at the end of the revised first para? > > > >> > >> > >>> +Any two concurrent clients that might access the same server must have > >>> +different identifiers for correct operation, and any two consecutive > >>> +instances of the same client should have the same identifier for optimal > >>> +crash recovery. > >> > >> Also recovery from network partitions. > > > > A network partition doesn't coincide with two consecutive instances of the > > same client. There is just one client instance and one server instance. > > It's possible for one of the peers to reboot during the network > partition. True, but is that interesting? There are situations where the client will lose locks no matter what it does with its identity. These don't have any impact on choices of what you use for identity. There are also situations where the client won't lose locks. These are equally irrelevant. The only relevant situation (with respect to identifier stability) is when the server reboots, and the client is able to contact the server during the grace period. If it doesn't use the same identity as it used before, it can then lose locks. > > > >>> +.PP > >>> +If two different clients present the same identity to a server there are > >>> +two possible scenarios. If the clients use the same credential then the > >>> +server will treat them as the same client which appears to be restarting > >>> +frequently. One client may manage to open some files etc, but as soon > >>> +as the other client does anything the first client will lose access and > >>> +need to re-open everything. > >> > >> This seems fuzzy. > >> > >> 1. If locks are lost, then there is a substantial risk of data > >> corruption. > >> > >> 2. Is the client itself supposed to re-open files, or are > >> applications somehow notified that they need to re-open? > >> Either of these scenarios is fraught -- I don't believe any > >> application is coded to expect to have to re-open a file > >> due to exigent circumstances. > > > > I wasn't very happy with the description either. I think we want some > > detail, but not too much. > > > > The "re-opening" that I mentioned is the NFS client resubmitting NFS > > OPEN requests, not the application having to re-open. > > However if the application manages to get a lock, then when the "other" > > client connects to the server the application will lose the lock, and > > all read/write accesses on the relevant fd will result in EIO (I > > think). Clearly bad. > > > > I wanted to say the clients could end up "fighting" with each other - > > the EXCHANGE_ID from one destroys the state set up by the other - I that > > seems to be too much anthropomorphism. > > > > If two different clients present the same identity to a server there > > are two possible scenarios. If the clients use the same credential > > then the server will treat them as the same client which appears to > > be restarting frequently. The clients will each enter a loop where > > they establish state with the server and then find that the state > > has been destroy by the other client and so will need to establish > > it again. > > > > ??? > > My colleague Calum coined the term "lease stealing". That might be > a good thing to define somewhere and simply use that term as needed. > .PP If two different clients present the same identity to a server there are two possible scenarios. If the clients do not use cryptographic credentials, or use the same credential, then the server will treat them as the same client which appears to be restarting frequently. Each client will effectively "steal" the lease established by the other and neither will make useful progress. .PP If the clients use different cryptographic credentials, then the second client to establish a connection to the server will be refused access which is a safer failure mode. .PP Cryptographic credentials used to authenticate lease operations will be the host principal from .I /etc/krb5.keytab or in some cases, the lone user principal. These securely prevent lease stealing. > > >>> +.PP > >>> +If the clients use different credentials, then the second client to > >>> +establish a connection to the server will be refused access. For > >>> +.B auth=sys > >>> +the credential is based on hostname, so will be the same if the > >>> +identities are the same. With > >>> +.B auth=krb > >>> +the credential is stored in > >>> +.I /etc/krb5.keytab > >>> +and will be the same only if this is copied among hosts. > >> > >> This language implies that copying the keytab is a recommended thing > >> to do. It's not. I mentioned it before because some customers think > >> it's OK to use the same keytab across their client fleet. But obviously > >> that will result in lost open and lock state. > >> > >> I suggest rephrasing this last sentence to describe the negative lease > >> recovery consequence of two clients happening to share the same host > >> principal -- as in "This is why you shouldn't share keytabs..." > >> > > > > How about > > > > .PP > > If the clients use different credentials, then the second client to > > establish a connection to the server will be refused access which is a > > safer failure mode. For > > .B auth=sys > > the credential is based on hostname, so will be the same if the > > identities are the same. With > > .B auth=krb > > the credential is stored in > > .I /etc/krb5.keytab > > so providing this isn't copied among client the safer failure mode will result. > > With > .BR auth=krb5 , > the client uses the host principal in > .I /etc/krb5.keytab > or in some cases, the lone user principal, > to authenticate lease management operations. > This securely prevents lease stealing. > > > > > ?? > > > > Thanks for your details review! > > > > NeilBrown > > > >> > >>> +.PP > >>> +If the identity is unique but not stable, for example if it is generated > >>> +randomly on each start up of the NFS client, then crash recovery is > >>> +affected. When a client shuts down uncleanly and restarts, the server > >>> +will normally detect this because the same identity is presented with > >>> +different boot time (or "incarnation verifier"), and will discard old > >>> +state. If the client presents a different identifier, then the server > >>> +cannot discard old state until the lease time has expired, and the new > >>> +client may be delayed in opening or locking files that it was > >>> +previously accessing. > >>> .SH FILES > >>> .TP 1.5i > >>> .I /etc/fstab > >>> -- > >>> 2.35.1 > >>> > >> > >> -- > >> Chuck Lever > > -- > Chuck Lever > NeilBrown
> On Mar 17, 2022, at 10:00 PM, NeilBrown <neilb@suse.de> wrote: > > On Thu, 17 Mar 2022, Chuck Lever III wrote: >> Howdy Neil- > > G'day > >>>> The last sentence is made ambiguous by the use of passive voice. >>>> >>>> Suggest: "When hostname uniqueness cannot be guaranteed, the client >>>> administrator must provide extra identity information." >>> >>> Why must the client administrator do this? Why can't some automated >>> tool do this? Or some container-building environment. >>> That's an advantage of the passive voice, you don't need to assign >>> responsibility for the verb. >> >> My point is that in order to provide the needed information, >> elevated privilege is required. The current sentence reads as >> if J. Random User could be interrupted at some point and asked >> for help. >> >> In other words, the documentation should state that this is >> an administrative task. Here I'm not advocating for a specific >> mechanism to actually perform that task. > > ??? This whole man page is primarily about mount options, particularly > as they appear in /etc/fstab. These are not available to the non-admin. > Why would anyone think this section is any different? Because the nfs_client_id4 uniquifier is not a mount option and isn't mentioned anywhere else. It's not going to be familiar to some. As you and I know, most people are not careful readers. Do note that nfs(5) is really just an extension of mount(8). The sections you pointed to earlier (eg, DATA AND METADATA COHERENCE) are there to provide context explaining how to use NFS mount options. The patch you have proposed is for an API and protocol element that have nothing to do with NFS mount options. That by itself disqualifies a proposed addition to nfs(5). I suggest instead constructing an independent man page that is attached to the /etc file that contains the client ID uniquifier. Something akin to machine-id(5) ? >>>> I have a problem with basing our default uniqueness guarantee on >>>> hostnames "most of the time" hoping it will all work out. There >>>> are simply too many common cases where hostname stability can't be >>>> relied upon. Our sustaining teams will happily tell us this hope >>>> hasn't so far been born out. >>> >>> Maybe it has not been born out because there is no documented >>> requirement for it that we can point people to. >>> Clearly containers that use NFS are not currently all configured well to do >>> this. Some change is needed. Maybe adding a unique host name is the >>> easiest change ... or maybe not. >> >> You seem to be documenting the client's current behavior. >> The tone of the documentation is that this behavior is fine >> and works for most people. > > It certainly works for a lot of people. Many people are using NFSv4 > quite effectively. I'm sure there are people who are having problems > too, but let's not fall for the squeaky wheel fallacy. For some folks it fails silently and/or requires round trips with their distributor's call center. I would like not to discount their experience. >> It's the second part that I disagree with. Oracle Linux has >> bugs documenting this behavior is a problem, and I'm sure >> Red Hat does too. The current behavior is broken. It is this >> brokeness that we are trying to resolve. > > The current behaviour of NFS is NOT broken. Maybe is it not adequately > robust against certain configuration choices. Certainly we should make > it as robust as we reasonably can. But let's not overstate the problem. Years of bug reports suggests I'm not overstating anything. The plan, for a while now, has been to supplement the use of the hostname to address this very situation. You are now suggesting there is nothing to address, which I find difficult to swallow. >> So let me make a stronger statement: we should not >> document that broken behavior in nfs(5). Instead, we should >> fix that behavior, and then document the golden brown and >> delicious behavior. Updating nfs(5) first is putting >> DeCarte in front of de horse. >> >> >>> Surely NFS is not the *only* service that uses the host name. >>> Encouraging the use of unique host names might benefit others. >> >> Unless you have specific use cases that might benefit from >> ensuring hostname uniqueness, I would beg that you stay >> focused on the immediate issue of how the Linux client >> constructs its nfs_client_id4 strings. >> >> >>> The practical reality is that a great many NFS client installations do >>> currently depend on unique host names - after all, it actually works. >>> Is it really so unreasonable to try to encourage the exceptions to fit >>> the common pattern better? >> >> Yes it is unreasonable. >> >> NFS servers typically have a fixed DNS presence. They have >> to because clients mount by hostname. >> >> NFS clients, on the other hand, are not under that constraint. >> The only time I can think of where a client has to have a >> fixed hostname is if a krb5 host principal is involved. >> >> In so many other cases, eg. mobile computing or elastic >> services, the client hostname is mutable. I don't think >> it's fair to put another constraint on host naming here, >> especially one with implications of service denial or >> data corruption (see below). >> >> >>>> Maybe I'm just stating this to understand the purpose of this >>>> patch, but it could also be used as an "Intended audience" >>>> disclaimer in this new section. >>> >>> OK, so the "purpose of this patch" relates in part to a comment you made >>> earlier, which I include here: >>> >>>> Since it is just a line or two of code, it might be of little >>>> harm just to go with separate implementations for now and stop >>>> talking about it. If it sucks, we can fix the suckage. >>>> >>>> Who volunteers to implement this mechanism in mount.nfs ? >>> >>> I don't think this is the best next step. I think we need to get some >>> container system developer to contribute here. So far we only have >>> second hand anecdotes about problems. I think the most concrete is from >>> Ben suggesting that in at least one container system, using >>> /etc/machine-id is a good idea. >>> >>> I don't think we can change nfs-utils (whether mount.nfs or mount.conf >>> or some other way) to set identity from /etc/machine-id for everyone. >>> So we need at least for that container system to request that change. >>> >>> How would they like to do that? >>> >>> I suggest that we explain the problem to representatives of the various >>> container communities that we have contact with (Well... "you", more >>> than "we" as I don't have contacts). >> >> I'm all for involving one or more container experts. But IMO >> it's not appropriate to update our man page to do that. Let's >> update nfs(5) when we are done with this effort. > > Don't let perfect be the enemy of good. > We were making no progress with "fixing" nfs. Documenting "how it works > today" should never be a bad thing. To be clear, I don't have a problem with documenting the current behavior /somewhere else/. I do have a problem documenting it in nfs(5) as a situation that is fine, given its known shortcomings and the fact that it will be updated in short order. > Obviously we can (and must) update > the documentation when we update the behaviour. > > But if some concrete behavioural changes can be agreed and implemented > through this discussion, I'm happy for the documentation to land only > after those changes. > >>>>> +.IP \- 2 >>>>> +NFS-root (diskless) clients, where the DCHP server (or equivalent) does >>>>> +not provide a unique host name. >>>> >>>> Suggest this addition: >>>> >>>> .IP \- 2 >>>> >>>> Dynamically-assigned hostnames, where the hostname can be changed after >>>> a client reboot, while the client is booted, or if a client often >>>> repeatedly connects to multiple networks (for example if it is moved >>>> from home to an office every day). >>> >>> This is a different kettle of fish. The hostname is *always* included >>> in the identifier. If it isn't stable, then the identifier isn't >>> stable. >>> >>> I saw in the history that when you introduced the module parameter it >>> replaced the hostname. This caused problems in containers (which had >>> different host names) so Trond changed it so the module parameter >>> supplemented the hostname. >>> >>> If hostnames are really so poorly behaved I can see there might be a >>> case to suppress the hostname, but we don't have that option is current >>> kernels. Should we add it? >> >> I claim that it has become problematic to use the hostname in the >> nfs_client_id4 string. > > In that case, we should fix it - make it possible to exclude the > hostname from the nfs_client_id4 string. You make a convincing case. > Have you thoughts on how we should implement that? This functionality has been implemented for some time using either sysfs or a module parameter. Those APIs supplement the hostname with whatever string is provided. I don't think we need to exclude the hostname from the nfs_client_id4 -- in fact some folks might prefer keeping the hostname in there as an eye-catcher. But it's simply that the hostname by itself does not provide enough uniqueness. The plan for some time now has been to construct user space mechanisms to use the sysfs/module parameter APIs to always plug in a uniquifier. That relieves the hostname uniqueness dependencies as long as those mechanisms are used as often as possible. So in other words, today the default is to use the hostname; using the random uniqifier is an exception. The plan is to make the random uniqifier the default, and fall back on the hostname if for some reason the uniquifier initialization mechanism did not work. >>> The hostname is copied at boot by NFS, and >>> if it is included in the /sys/fs/nfs/client/identifier (which would be >>> pointless, but not harmful) it has again been copied. >>> >>> If it is different on subsequent boots, then that is a big problem and >>> not one that we can currently fix. >> >> Yes, we can fix it: don't use the client's hostname but >> instead use a separate persistent uniquifier, as has been >> proposed. >> >> >>> ....except that non-persistent client identifiers isn't an enormous >>> problem, just a possible cause of delays. >> >> I disagree, it's a significant issue. >> >> - If locks are lost, that is a potential source of data corruption. >> >> - If a lease is stolen, that is a denial of service. >> >> Our customers take this very seriously. > > Of course, as they should. data integrity is paramount. > non-persistent client identifier doesn't put that as risk - not in and > of itself. > > If a client's identifier changed during the lifetime of one instance of > the client, then that would allow locks to be lost. That does NOT > happen just because you happen to change the host name. The hostname is > copied at first use. > It *could* happen if you changed the module parameter or sysfs identity > after the first mount, but I hope we can agree that not a justifiable > action. > > A lease can only be "stolen" by a non-unique identifier, not simply by > non-persistent identifiers. But maybe this needs a caveat. In this thread, I refer mostly to issues caused by nfs_client_id4 non-uniqueness. This is indeed the class of misbehavior that is significant to our customer base. Multiple clients might use "localhost.localdomain" simply because that's the way the imaging template is built. Or when an image is copied to create a new guest, the hostname is not changed. Those are but two examples. In many cases, client administrators are simply not in control of their hostnames. In cloud deployments, AUTH_SYS is the norm because managing a large Kerberos realm is generally onerous. Thus AUTH_SYS plus a hostname-uniquified nfs_client_id4 is by far the common case, though it is the most risky one. > If a set of clients are each given host names from time to time which > are, at any moment in time, unique, but are able to "migrate" from one > client to another, then it would be possible for two clients to both > have performed their first NFS mount when they have some common > hosttname X. The "first" was given host X at boot time, it mounted > something. The hostname was subsequently change to Y and some other > host booted and got X and then mounted from the same server. This > would be seriously problematic. I class this as "non-unique" hostnames, > not as non-persistent-identifier. > >> The NFS clients's >> out-of-the-shrink-wrap default behavior/configuration should be >> conservative enough to prevent these issues. Customers store >> mission critical data via NFS. Most customers expect NFS to work >> reliably without a lot of configuration fuss. > > I've been working on the assumption that it is not possible to provide > ideal zero-config behaviour "out-of-the-shrink-wrap". You have hinted > (or more) a few times that this is your goal. Certainly a worthy goal if > possible. Is it possible? > > I contend that if there is no common standard for how containers (and > network namespaces in particular) are used, then it is simply not > possible to provide perfect out-of-the-box behaviour. There *must* be > some local configuration that we cannot enforce through the kernel or > through nfs-utils. We can offer, but we cannot enforce. So we must > document. > > The very best that we could do would be to provide a random component to > the identifier unless we had a high level of confidence that a unique > identifier had been provided some other way. I don't know how to get > that high level of confidence in a way that doesn't break working > configurations. > Ben suggested defaulting 'identity' to a random string for any network > namespace other than init. I don't think that is cautious enough. > Maybe if we did it when the network namespace is not init, but the UTS > namepsace is init. But that feels like a hack and is probably brittle. > > Can you suggest *any* way to improve the "out-of-shrink-wrap" behaviour > significantly? Well it sounds like we agree that making the random uniquifier the default is a good step forward. Just because this has been contentious so far, I think we should strive for something that is a best effort but clearly a step up. The fall back can use the hostname. Over time the remaining gaps can be stopped. Here are some suggestions that might make it simpler to implement. 1. Ben's tool manufactures the uniqifier if the file doesn't already exist. That seems somewhat racy. Instead, why not make installation utilities responsible for creating the uniquifier? We need some guarantee that when a VM is cloned, the uniquifier is replaced, for instance; that's well outside nfs-utils' sphere of influence. Document the requirements (a la machine-id(5)) then point the distributors and Docker folks at that. I think that is your plan, right? I've done the same with at least one of Oracle's virtualization products, while waiting for a more general upstream solution. Then, either each mount.nfs invocation or some part of system start-up checks for the uniquifier file and pushes the uniquifier into the local net namespace. (Doing this only once at boot has its appeal). If the uniquifier file does not exist, then the NFS client continues to use a hostname uniquifier. Over time we find and address the fallback cases. 2. The udev rule mechanism that Trond proposed attempted to address both init_ns and subsequent namespaces the same way. Maybe it's time to examine the assumptions there to help us make more progress. Use independent mechanisms for the init_ns and for subsequent net namespaces. Perhaps Ben already suggested this. Looking back over weeks of this conversation, these two use cases seem fundamentally different from each other. The init_ns has to handle NFSROOT, can use the boot command line or the module parameter to deal with PXE booting and so on. The Docker case can use whatever works better for them. 3. We don't yet have a way to guarantee that the uniquifier is in place before the first NFS mount is initiated. Talking with someone who has deep systemd expertise might help. It might also help at least in the non-container case if the uniquifier is provided on the kernel command line, the same way that root= is specified. 4. An alternative for the init_ns case might be to add a mechanism to initramfs to set the client's uniquifier. On my clients where containers are not in use, I set the uniquifier using the module parameter; the module load config file needs to be added to initramfs before it takes effect. >>>> If we want to create a good uniquifier here, then combine the >>>> hostname, netns identity, and/or the host's machine-id and then >>>> hash that blob with a known strong digest algorithm like >>>> SHA-256. A man page must not recommend the use of deprecated or >>>> insecure obfuscation mechanisms. >>> >>> I didn't realize the hash that uuidgen uses was deprecated. Is there >>> some better way to provide an app-specific obfuscation of a string from >>> the command line? >>> >>> Maybe >>> echo nfs-id:`cat /etc/machine-id`| sha256sum >>> >>> ?? >> >> Something like that, yes. But the scriptlet needs to also >> involve the netns identity somehow. > > Hmmm.. the impression I got from Ben was that the container system > ensured that /etc/machine-id was different in different containers. So > there would be no need to add anything. Of course I should make that > explicit in the documentation. > > I would be nice if we could always use "ip netns identify", but that > doesn't seem to be generally supported. If containers provide unique machine-ids, a digest of the machine-id is fine with me. Note that many implementations don't tolerate a large nfs_client_id4 string, so keeping the digest size small might be needed. Using blake2 might be a better choice. -- Chuck Lever
On Sat, 19 Mar 2022, Chuck Lever III wrote: > > Here are some suggestions that might make it simpler to implement. > > 1. Ben's tool manufactures the uniqifier if the file doesn't > already exist. That seems somewhat racy. Instead, why not > make installation utilities responsible for creating the > uniquifier? We need some guarantee that when a VM is cloned, > the uniquifier is replaced, for instance; that's well > outside nfs-utils' sphere of influence. You say "the file" like that is a well defined concept. It isn't. In the context of a container we don't even know if there is *any* stable local storage. The existence of "the file" is as much out side of nfs-util's sphere of influence as the cloning of a VM is. At least the cloning of a VM is, or soon will be (https://lwn.net/Articles/887207/), within the sphere of influence of the NFS kernel module. It will be able to detect the fork and .... do something. Maybe disable access to all existing mounts and refuse new mounts until 'identity' has been set. If NFS had always required identity to be set in a container before allowing mounts, then the udev approach could work and we would be in a much better place. But none of us knew that then, and it is too late for that now (is it?). This conversation seems to be going around in circles and not getting anywhere. As as I have no direct interest (the SUSE bugzilla has precisely 1 bug relating to NFS and non-unique hostnames, and the customer seemed to accept the requirement of unique hostnames) I am going to bow out. I might post one more attempt at a documentation update ... or I might not. Thanks, NeilBrown
> On Mar 14, 2022, at 8:41 PM, NeilBrown <neilb@suse.de> wrote: > > On Tue, 15 Mar 2022, Chuck Lever III wrote: >> Hi Neil- >> >>> +.IP \- 2 >>> +NFS-root (diskless) clients, where the DCHP server (or equivalent) does >>> +not provide a unique host name. >> >> Suggest this addition: >> >> .IP \- 2 >> >> Dynamically-assigned hostnames, where the hostname can be changed after >> a client reboot, while the client is booted, or if a client often >> repeatedly connects to multiple networks (for example if it is moved >> from home to an office every day). > > This is a different kettle of fish. The hostname is *always* included > in the identifier. If it isn't stable, then the identifier isn't > stable. > > I saw in the history that when you introduced the module parameter it > replaced the hostname. This caused problems in containers (which had > different host names) so Trond changed it so the module parameter > supplemented the hostname. > > If hostnames are really so poorly behaved I can see there might be a > case to suppress the hostname, but we don't have that option is current > kernels. Should we add it? I didn't fully understand this comment before. I assume you are referring to: 55b592933b7d ("NFSv4: Fix nfs4_init_uniform_client_string for net namespaces") That will likely break reboot recovery if the container's nodename changes over a reboot. My (probably limited) understanding is that using the udev rule to always add a uniquifier could have helped make it possible to remove the hostname from the co_ownerid. For the record, I take back this statement: > I don't think we need to > exclude the hostname from the nfs_client_id4 -- in fact some folks > might prefer keeping the hostname in there as an eye-catcher. But > it's simply that the hostname by itself does not provide enough > uniqueness. Since the nodename can change at inopportune times (like over a reboot), including it in the co_ownerid string can sometimes be problematic. But I don't have a better suggestion at this time. > This conversation seems to be going around in circles and not getting > anywhere. As as I have no direct interest (the SUSE bugzilla has > precisely 1 bug relating to NFS and non-unique hostnames, and the > customer seemed to accept the requirement of unique hostnames) I am > going to bow out. I might post one more attempt at a documentation > update ... or I might not. I now agree that the Linux NFS community will need to work with packagers and distributors, and that we will not arrive at a one- size-fits-all tool by ourselves. I'll try to pick up the documentation torch. Thanks for your efforts so far. -- Chuck Lever
diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man index d9f34df36b42..5f15abe8cf72 100644 --- a/utils/mount/nfs.man +++ b/utils/mount/nfs.man @@ -1,7 +1,7 @@ .\"@(#)nfs.5" .TH NFS 5 "9 October 2012" .SH NAME -nfs \- fstab format and options for the +nfs \- fstab format and configuration for the .B nfs file systems .SH SYNOPSIS @@ -1844,6 +1844,113 @@ export pathname, but not both, during a remount. For example, merges the mount option .B ro with the mount options already saved on disk for the NFS server mounted at /mnt. +.SH "NFS CLIENT IDENTIFIER" +NFSv4 requires that the client present a unique identifier to the server +to be used to track state such as file locks. By default Linux NFS uses +the host name, as configured at the time of the first NFS mount, +together with some fixed content such as the name "Linux NFS" and the +particular protocol version. When the hostname is guaranteed to be +unique among all client which access the same server this is sufficient. +If hostname uniqueness cannot be assumed, extra identity information +must be provided. +.PP +Some situations which are known to be problematic with respect to unique +host names include: +.IP \- 2 +NFS-root (diskless) clients, where the DCHP server (or equivalent) does +not provide a unique host name. +.IP \- 2 +"containers" within a single Linux host. If each container has a separate +network namespace, but does not use the UTS namespace to provide a unique +host name, then there can be multiple effective NFS clients with the +same host name. +.IP \= 2 +Clients across multiple administrative domains that access a common NFS +server. If assignment of host name is devolved to separate domains, +uniqueness cannot be guaranteed, unless a domain name is included in the +host name. +.SS "Increasing Client Uniqueness" +Apart from the host name, which is the preferred way to differentiate +NFS clients, there are two mechanisms to add uniqueness to the +client identifier. +.TP +.B nfs.nfs4_unique_id +This module parameter can be set to an arbitrary string at boot time, or +when the +.B nfs +module is loaded. This might be suitable for configuring diskless clients. +.TP +.B /sys/fs/nfs/client/net/identifier +This virtual file (available since Linux 5.3) is local to the network +name-space in which it is accessed and so can provided uniqueness between +network namespaces (containers) when the hostname remains uniform. +.RS +.PP +This value is empty on name-space creation. +If the value is to be set, that should be done before the first +mount. If the container system has access to some sort of per-container +identity then that identity, possibly obfuscated as a UUID is privacy is +needed, can be used. Combining the identity with the name of the +container systems would also help. For example: +.RS 4 +echo "ip-netns:`ip netns identify`" \\ +.br + > /sys/fs/nfs/client/net/identifier +.br +uuidgen --sha1 --namespace @url \\ +.br + -N "nfs:`cat /etc/machine-id`" \\ +.br + > /sys/fs/nfs/client/net/identifier +.RE +If the container system provides no stable name, +but does have stable storage, then something like +.RS 4 +[ -s /etc/nfsv4-uuid ] || uuidgen > /etc/nfsv4-uuid && +.br +cat /etc/nfsv4-uuid > /sys/fs/nfs/client/net/identifier +.RE +would suffice. +.PP +If a container has neither a stable name nor stable (local) storage, +then it is not possible to provide a stable identifier, so providing +a random identifier to ensure uniqueness would be best +.RS 4 +uuidgen > /sys/fs/nfs/client/net/identifier +.RE +.RE +.SS Consequences of poor identity setting +Any two concurrent clients that might access the same server must have +different identifiers for correct operation, and any two consecutive +instances of the same client should have the same identifier for optimal +crash recovery. +.PP +If two different clients present the same identity to a server there are +two possible scenarios. If the clients use the same credential then the +server will treat them as the same client which appears to be restarting +frequently. One client may manage to open some files etc, but as soon +as the other client does anything the first client will lose access and +need to re-open everything. +.PP +If the clients use different credentials, then the second client to +establish a connection to the server will be refused access. For +.B auth=sys +the credential is based on hostname, so will be the same if the +identities are the same. With +.B auth=krb +the credential is stored in +.I /etc/krb5.keytab +and will be the same only if this is copied among hosts. +.PP +If the identity is unique but not stable, for example if it is generated +randomly on each start up of the NFS client, then crash recovery is +affected. When a client shuts down uncleanly and restarts, the server +will normally detect this because the same identity is presented with +different boot time (or "incarnation verifier"), and will discard old +state. If the client presents a different identifier, then the server +cannot discard old state until the lease time has expired, and the new +client may be delayed in opening or locking files that it was +previously accessing. .SH FILES .TP 1.5i .I /etc/fstab
When mounting NFS filesystem in a network namespace using v4, some care must be taken to ensure a unique and stable client identity. Similar case is needed for NFS-root and other situations. Add documentation explaining the requirements for the NFS identity in these situations. Signed-off-by: NeilBrown <neilb@suse.de> --- I think I've address most of the feedback, but please forgive and remind if I missed something. NeilBrown utils/mount/nfs.man | 109 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 108 insertions(+), 1 deletion(-)