diff mbox series

nfs.man: document requirements for NFS mounts in a container

Message ID 164610623626.24921.6124450559951707560@noble.neil.brown.name (mailing list archive)
State New, archived
Headers show
Series nfs.man: document requirements for NFS mounts in a container | expand

Commit Message

NeilBrown March 1, 2022, 3:43 a.m. UTC
When mounting NFS filesystems in a network namespace using v4, some care
must be taken to ensure a unique and stable client identity.
Add documentation explaining the requirements for container managers.

Signed-off-by: NeilBrown <neilb@suse.de>
---

NOTE I originally suggested using uuidgen to generate a uuid from a
container name.  I've changed it to use the name as-is because I cannot
see a justification for using a uuid - though I think that was suggested
somewhere in the discussion.
If someone would like to provide that justification, I'm happy to
include it in the document.

Thanks,
NeilBrown


 utils/mount/nfs.man | 63 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

Comments

Chuck Lever March 1, 2022, 3:08 p.m. UTC | #1
> On Feb 28, 2022, at 10:43 PM, NeilBrown <neilb@suse.de> wrote:
> 
> 
> When mounting NFS filesystems in a network namespace using v4, some care
> must be taken to ensure a unique and stable client identity.
> Add documentation explaining the requirements for container managers.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> 
> NOTE I originally suggested using uuidgen to generate a uuid from a
> container name.  I've changed it to use the name as-is because I cannot
> see a justification for using a uuid - though I think that was suggested
> somewhere in the discussion.
> If someone would like to provide that justification, I'm happy to
> include it in the document.
> 
> Thanks,
> NeilBrown
> 
> 
> utils/mount/nfs.man | 63 +++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 63 insertions(+)
> 
> diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man
> index d9f34df36b42..4ab76fb2df91 100644
> --- a/utils/mount/nfs.man
> +++ b/utils/mount/nfs.man
> @@ -1844,6 +1844,69 @@ export pathname, but not both, during a remount.  For example,
> merges the mount option
> .B ro
> with the mount options already saved on disk for the NFS server mounted at /mnt.
> +.SH "NFS IN A CONTAINER"

To be clear, this explanation is about the operation of the
Linux NFS client in a container environment. The server has
different needs that do not appear to be addressed here.
The section title should be clear that this information
pertains to the client.


> +When NFS is used to mount filesystems in a container, and specifically
> +in a separate network name-space, these mounts are treated as quite
> +separate from any mounts in a different container or not in a
> +container (i.e. in a different network name-space).

It might be helpful to provide an introductory explanation of
how mount works in general in a namespaced environment. There
might already be one somewhere. The above text needs to be
clear that we are not discussing the mount namespace.


> +.P
> +In the NFSv4 protocol, each client must have a unique identifier.

... each client must have a persistent and globally unique
identifier.


> +This is used by the server to determine when a client has restarted,
> +allowing any state from a previous instance can be discarded.

Lots of passive voice here :-)

The server associates a lease with the client's identifier
and a boot instance verifier. The server attaches all of
the client's file open and lock state to that lease, which
it preserves until the client's boot verifier changes.


> So any two
> +concurrent clients that might access the same server MUST have
> +different identifiers, and any two consecutive instances of the same
> +client SHOULD have the same identifier.

Capitalized MUST and SHOULD have specific meanings in IETF
standards that are probably not obvious to average readers
of man pages. To average readers, this looks like shouting.
Can you use something a little friendlier?


> +.P
> +Linux constructs the identifier (referred to as 
> +.B co_ownerid
> +in the NFS specifications) from various pieces of information, three of
> +which can be controlled by the sysadmin:
> +.TP
> +Hostname
> +The hostname can be different in different containers if they
> +have different "UTS" name-spaces.  If the container system ensures
> +each container sees a unique host name,

Actually, it turns out that is a pretty big "if". We've
found that our cloud customers are not careful about
setting unique hostnames. That's exactly why the whole
uniquifier thing is so critical!


> then this is
> +sufficient for a correctly functioning NFS identifier.
> +The host name is copied when the first NFS filesystem is mounted in
> +a given network name-space.  Any subsequent change in the apparent
> +hostname will not change the NFSv4 identifier.

The purpose of using a uuid here is that, given its
definition in RFC 4122, it has very strong global
uniqueness guarantees.

Using a UUID makes hostname uniqueness irrelevant.

Again, I think our goal should be hiding all of this
detail from administrators, because once we get this
mechanism working correctly, there is absolutely no
need for administrators to bother with it.


The remaining part of this text probably should be
part of the man page for Ben's tool, or whatever is
coming next.


> +.TP
> +.B nfs.nfs4_unique_id
> +This module parameter is the same for all containers on a given host
> +so it is not useful to differentiate between containers.
> +.TP
> +.B /sys/fs/nfs/client/net/identifier
> +This virtual file (available since Linux 5.3) is local to the network
> +name-space in which it is accessed and so can provided uniqueness between
> +containers when the hostname is uniform among containers.
> +.RS
> +.PP
> +This value is empty on name-space creation.
> +If the value is to be set, that should be done before the first
> +mount (much as the hostname is copied before the first mount).
> +If the container system has access to some sort of per-container
> +identity, then a command like
> +.RS 4
> +echo "$CONTAINER_IDENTITY" \\
> +.br
> +   > /sys/fs/nfs/client/net/identifier 
> +.RE
> +might be suitable.  If the container system provides no stable name,
> +but does have stable storage, then something like
> +.RS 4
> +[ -s /etc/nfsv4-uuid ] || uuidgen > /etc/nfsv4-uuid && 
> +.br
> +cat /etc/nfsv4-uuid > /sys/fs/nfs/client/net/identifier 
> +.RE
> +would suffice.
> +.PP
> +If a container has neither a stable name nor stable (local) storage,
> +then it is not possible to provide a stable identifier, so providing
> +a random one to ensure uniqueness would be best
> +.RS 4
> +uuidgen > /sys/fs/nfs/client/net/identifier
> +.RE
> +.RE
> .SH FILES
> .TP 1.5i
> .I /etc/fstab
> -- 
> 2.35.1
> 

--
Chuck Lever
Chuck Lever March 1, 2022, 3:16 p.m. UTC | #2
> On Mar 1, 2022, at 10:08 AM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> 
> 
> 
>> On Feb 28, 2022, at 10:43 PM, NeilBrown <neilb@suse.de> wrote:
>> 
>> 
>> When mounting NFS filesystems in a network namespace using v4, some care
>> must be taken to ensure a unique and stable client identity.
>> Add documentation explaining the requirements for container managers.
>> 
>> Signed-off-by: NeilBrown <neilb@suse.de>
>> ---
>> 
>> NOTE I originally suggested using uuidgen to generate a uuid from a
>> container name.  I've changed it to use the name as-is because I cannot
>> see a justification for using a uuid - though I think that was suggested
>> somewhere in the discussion.
>> If someone would like to provide that justification, I'm happy to
>> include it in the document.
>> 
>> Thanks,
>> NeilBrown
>> 
>> 
>> utils/mount/nfs.man | 63 +++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 63 insertions(+)
>> 
>> diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man
>> index d9f34df36b42..4ab76fb2df91 100644
>> --- a/utils/mount/nfs.man
>> +++ b/utils/mount/nfs.man
>> @@ -1844,6 +1844,69 @@ export pathname, but not both, during a remount.  For example,
>> merges the mount option
>> .B ro
>> with the mount options already saved on disk for the NFS server mounted at /mnt.
>> +.SH "NFS IN A CONTAINER"
> 
> To be clear, this explanation is about the operation of the
> Linux NFS client in a container environment. The server has
> different needs that do not appear to be addressed here.
> The section title should be clear that this information
> pertains to the client.
> 
> 
>> +When NFS is used to mount filesystems in a container, and specifically
>> +in a separate network name-space, these mounts are treated as quite
>> +separate from any mounts in a different container or not in a
>> +container (i.e. in a different network name-space).
> 
> It might be helpful to provide an introductory explanation of
> how mount works in general in a namespaced environment. There
> might already be one somewhere. The above text needs to be
> clear that we are not discussing the mount namespace.
> 
> 
>> +.P
>> +In the NFSv4 protocol, each client must have a unique identifier.
> 
> ... each client must have a persistent and globally unique
> identifier.
> 
> 
>> +This is used by the server to determine when a client has restarted,
>> +allowing any state from a previous instance can be discarded.
> 
> Lots of passive voice here :-)
> 
> The server associates a lease with the client's identifier
> and a boot instance verifier. The server attaches all of
> the client's file open and lock state to that lease, which
> it preserves until the client's boot verifier changes.

Oh and also, this might be a good opportunity to explain
how the server requires that the client use not only the
same identifier string, but also the same principal to
reattach itself to its open and lock state after a server
reboot.

This is why the Linux NFS client attempts to use Kerberos
whenever it can for this purpose. Using AUTH_SYS invites
other another client that happens to have the same identifier
to trigger the server to purge that client's open and lock
state.


>> So any two
>> +concurrent clients that might access the same server MUST have
>> +different identifiers, and any two consecutive instances of the same
>> +client SHOULD have the same identifier.
> 
> Capitalized MUST and SHOULD have specific meanings in IETF
> standards that are probably not obvious to average readers
> of man pages. To average readers, this looks like shouting.
> Can you use something a little friendlier?
> 
> 
>> +.P
>> +Linux constructs the identifier (referred to as 
>> +.B co_ownerid
>> +in the NFS specifications) from various pieces of information, three of
>> +which can be controlled by the sysadmin:
>> +.TP
>> +Hostname
>> +The hostname can be different in different containers if they
>> +have different "UTS" name-spaces.  If the container system ensures
>> +each container sees a unique host name,
> 
> Actually, it turns out that is a pretty big "if". We've
> found that our cloud customers are not careful about
> setting unique hostnames. That's exactly why the whole
> uniquifier thing is so critical!
> 
> 
>> then this is
>> +sufficient for a correctly functioning NFS identifier.
>> +The host name is copied when the first NFS filesystem is mounted in
>> +a given network name-space.  Any subsequent change in the apparent
>> +hostname will not change the NFSv4 identifier.
> 
> The purpose of using a uuid here is that, given its
> definition in RFC 4122, it has very strong global
> uniqueness guarantees.
> 
> Using a UUID makes hostname uniqueness irrelevant.
> 
> Again, I think our goal should be hiding all of this
> detail from administrators, because once we get this
> mechanism working correctly, there is absolutely no
> need for administrators to bother with it.
> 
> 
> The remaining part of this text probably should be
> part of the man page for Ben's tool, or whatever is
> coming next.
> 
> 
>> +.TP
>> +.B nfs.nfs4_unique_id
>> +This module parameter is the same for all containers on a given host
>> +so it is not useful to differentiate between containers.
>> +.TP
>> +.B /sys/fs/nfs/client/net/identifier
>> +This virtual file (available since Linux 5.3) is local to the network
>> +name-space in which it is accessed and so can provided uniqueness between
>> +containers when the hostname is uniform among containers.
>> +.RS
>> +.PP
>> +This value is empty on name-space creation.
>> +If the value is to be set, that should be done before the first
>> +mount (much as the hostname is copied before the first mount).
>> +If the container system has access to some sort of per-container
>> +identity, then a command like
>> +.RS 4
>> +echo "$CONTAINER_IDENTITY" \\
>> +.br
>> +   > /sys/fs/nfs/client/net/identifier 
>> +.RE
>> +might be suitable.  If the container system provides no stable name,
>> +but does have stable storage, then something like
>> +.RS 4
>> +[ -s /etc/nfsv4-uuid ] || uuidgen > /etc/nfsv4-uuid && 
>> +.br
>> +cat /etc/nfsv4-uuid > /sys/fs/nfs/client/net/identifier 
>> +.RE
>> +would suffice.
>> +.PP
>> +If a container has neither a stable name nor stable (local) storage,
>> +then it is not possible to provide a stable identifier, so providing
>> +a random one to ensure uniqueness would be best
>> +.RS 4
>> +uuidgen > /sys/fs/nfs/client/net/identifier
>> +.RE
>> +.RE
>> .SH FILES
>> .TP 1.5i
>> .I /etc/fstab
>> -- 
>> 2.35.1
>> 
> 
> --
> Chuck Lever

--
Chuck Lever
NeilBrown March 3, 2022, 3:26 a.m. UTC | #3
On Wed, 02 Mar 2022, Chuck Lever III wrote:
> 
> > On Feb 28, 2022, at 10:43 PM, NeilBrown <neilb@suse.de> wrote:
> > 
> > 
> > When mounting NFS filesystems in a network namespace using v4, some care
> > must be taken to ensure a unique and stable client identity.
> > Add documentation explaining the requirements for container managers.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> > 
> > NOTE I originally suggested using uuidgen to generate a uuid from a
> > container name.  I've changed it to use the name as-is because I cannot
> > see a justification for using a uuid - though I think that was suggested
> > somewhere in the discussion.
> > If someone would like to provide that justification, I'm happy to
> > include it in the document.
> > 
> > Thanks,
> > NeilBrown
> > 
> > 
> > utils/mount/nfs.man | 63 +++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 63 insertions(+)
> > 
> > diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man
> > index d9f34df36b42..4ab76fb2df91 100644
> > --- a/utils/mount/nfs.man
> > +++ b/utils/mount/nfs.man
> > @@ -1844,6 +1844,69 @@ export pathname, but not both, during a remount.  For example,
> > merges the mount option
> > .B ro
> > with the mount options already saved on disk for the NFS server mounted at /mnt.
> > +.SH "NFS IN A CONTAINER"
> 
> To be clear, this explanation is about the operation of the
> Linux NFS client in a container environment. The server has
> different needs that do not appear to be addressed here.
> The section title should be clear that this information
> pertains to the client.

The whole man page is only about the client, but I agree that clarity is
best.  I've changed the section heading to

    NFS MOUNTS IN A CONTAINER

> 
> 
> > +When NFS is used to mount filesystems in a container, and specifically
> > +in a separate network name-space, these mounts are treated as quite
> > +separate from any mounts in a different container or not in a
> > +container (i.e. in a different network name-space).
> 
> It might be helpful to provide an introductory explanation of
> how mount works in general in a namespaced environment. There
> might already be one somewhere. The above text needs to be
> clear that we are not discussing the mount namespace.

Mount namespaces are completely irrelevant for this discussion.
This is "specifically" about "network name-spaces" a I wrote.
Do I need to say more than that?
Maybe a sentence "Mount namespaces are not relevant" ??

> 
> 
> > +.P
> > +In the NFSv4 protocol, each client must have a unique identifier.
> 
> ... each client must have a persistent and globally unique
> identifier.

I dispute "globally".  The id only needs to be unique among clients of
a given NFS server.
I also dispute "persistent" in the context of "must".
Unless I'm missing something, a lack of persistence only matters when a
client stops while still holding state, and then restarts within the
lease period.  It will then be prevented from establishing conflicting
state until the lease period ends.  So persistence is good, but is not a
hard requirement.  Uniqueness IS a hard requirement among concurrent
clients of the one server.

> 
> 
> > +This is used by the server to determine when a client has restarted,
> > +allowing any state from a previous instance can be discarded.
> 
> Lots of passive voice here :-)
> 
> The server associates a lease with the client's identifier
> and a boot instance verifier. The server attaches all of
> the client's file open and lock state to that lease, which
> it preserves until the client's boot verifier changes.

I guess I"m a passivist.  If we are going for that level of detail we
need to mention lease expiry too.

 .... it preserves until the lease time passes without any renewal from
      the client, or the client's boot verifier changes.

In another email you add:

> Oh and also, this might be a good opportunity to explain
> how the server requires that the client use not only the
> same identifier string, but also the same principal to
> reattach itself to its open and lock state after a server
> reboot.
> 
> This is why the Linux NFS client attempts to use Kerberos
> whenever it can for this purpose. Using AUTH_SYS invites
> other another client that happens to have the same identifier
> to trigger the server to purge that client's open and lock
> state.

How relevant is this to the context of a container?
How much extra context would be need to add to make the mention of
credentials coherent?
Maybe we should add another section about credentials, and add it just
before this one??

> 
> 
> > So any two
> > +concurrent clients that might access the same server MUST have
> > +different identifiers, and any two consecutive instances of the same
> > +client SHOULD have the same identifier.
> 
> Capitalized MUST and SHOULD have specific meanings in IETF
> standards that are probably not obvious to average readers
> of man pages. To average readers, this looks like shouting.
> Can you use something a little friendlier?
> 

How about:

   Any two concurrent clients that might access the same server must
   have different identifiers for correct operation, and any two
   consecutive instances of the same client should have the same
   identifier for optimal handling of an unclean restart.

> 
> > +.P
> > +Linux constructs the identifier (referred to as 
> > +.B co_ownerid
> > +in the NFS specifications) from various pieces of information, three of
> > +which can be controlled by the sysadmin:
> > +.TP
> > +Hostname
> > +The hostname can be different in different containers if they
> > +have different "UTS" name-spaces.  If the container system ensures
> > +each container sees a unique host name,
> 
> Actually, it turns out that is a pretty big "if". We've
> found that our cloud customers are not careful about
> setting unique hostnames. That's exactly why the whole
> uniquifier thing is so critical!

:-)  I guess we keep it as "if" though, not "IF" ....

> 
> 
> > then this is
> > +sufficient for a correctly functioning NFS identifier.
> > +The host name is copied when the first NFS filesystem is mounted in
> > +a given network name-space.  Any subsequent change in the apparent
> > +hostname will not change the NFSv4 identifier.
> 
> The purpose of using a uuid here is that, given its
> definition in RFC 4122, it has very strong global
> uniqueness guarantees.

A uuid generated from a given string (uuidgen -N $name ...) has the same
uniqueness as the $name.  Turning it into a uuid doesn't improve the
uniqueness.  It just provides a standard format and obfuscates the
original.  Neither of those seem necessary here.
I think Ben is considering using /etc/mechine-id.  Creating a uuid from
that does make it any better.

> 
> Using a UUID makes hostname uniqueness irrelevant.

Only if the UUID is created appropriately.  If, for example, it is
created with -N from some name that is unique on the host, then it needs
to be combined with the hostname to get sufficient uniqueness.

> 
> Again, I think our goal should be hiding all of this
> detail from administrators, because once we get this
> mechanism working correctly, there is absolutely no
> need for administrators to bother with it.

Except when things break.  Then admins will appreciate having the
details so they can track down the breakage.  My desktop didn't boot
this morning.  Systemd didn't tell me why it was hanging though I
eventually discovered that it was "wicked.service" that wasn't reporting
success.  So I'm currently very focused on the need to provide clarity
to sysadmins, even of "irrelevant" details.

But this documentation isn't just for sysadmins, it is for container
developers too, so they can find out how to make their container work
with NFS.

> 
> 
> The remaining part of this text probably should be
> part of the man page for Ben's tool, or whatever is
> coming next.

My position is that there is no need for any tool.  The total amount of
code needed is a couple of lines as presented in the text below.  Why
provide a wrapper just for that?
We *cannot* automatically decide how to find a name or where to store a
generated uuid, so there is no added value that a tool could provide.

We cannot unilaterally fix container systems.  We can only tell people
who build these systems of the requirements for NFS.

Thanks,
NeilBrown

> 
> 
> > +.TP
> > +.B nfs.nfs4_unique_id
> > +This module parameter is the same for all containers on a given host
> > +so it is not useful to differentiate between containers.
> > +.TP
> > +.B /sys/fs/nfs/client/net/identifier
> > +This virtual file (available since Linux 5.3) is local to the network
> > +name-space in which it is accessed and so can provided uniqueness between
> > +containers when the hostname is uniform among containers.
> > +.RS
> > +.PP
> > +This value is empty on name-space creation.
> > +If the value is to be set, that should be done before the first
> > +mount (much as the hostname is copied before the first mount).
> > +If the container system has access to some sort of per-container
> > +identity, then a command like
> > +.RS 4
> > +echo "$CONTAINER_IDENTITY" \\
> > +.br
> > +   > /sys/fs/nfs/client/net/identifier 
> > +.RE
> > +might be suitable.  If the container system provides no stable name,
> > +but does have stable storage, then something like
> > +.RS 4
> > +[ -s /etc/nfsv4-uuid ] || uuidgen > /etc/nfsv4-uuid && 
> > +.br
> > +cat /etc/nfsv4-uuid > /sys/fs/nfs/client/net/identifier 
> > +.RE
> > +would suffice.
> > +.PP
> > +If a container has neither a stable name nor stable (local) storage,
> > +then it is not possible to provide a stable identifier, so providing
> > +a random one to ensure uniqueness would be best
> > +.RS 4
> > +uuidgen > /sys/fs/nfs/client/net/identifier
> > +.RE
> > +.RE
> > .SH FILES
> > .TP 1.5i
> > .I /etc/fstab
> > -- 
> > 2.35.1
> > 
> 
> --
> Chuck Lever
> 
> 
> 
>
Trond Myklebust March 3, 2022, 2:37 p.m. UTC | #4
On Thu, 2022-03-03 at 14:26 +1100, NeilBrown wrote:
> On Wed, 02 Mar 2022, Chuck Lever III wrote:
> 
> 
> > 
> > 
> > The remaining part of this text probably should be
> > part of the man page for Ben's tool, or whatever is
> > coming next.
> 
> My position is that there is no need for any tool.  The total amount
> of
> code needed is a couple of lines as presented in the text below.  Why
> provide a wrapper just for that?
> We *cannot* automatically decide how to find a name or where to store
> a
> generated uuid, so there is no added value that a tool could provide.
> 
> We cannot unilaterally fix container systems.  We can only tell
> people
> who build these systems of the requirements for NFS.
> 

I disagree with this position. The value of having a standard tool is
that it also creates a standard for how and where the uniquifier is
generated and persisted.

Otherwise you have to deal with the fact that you may have a systemd
script that persists something in one file, a Dockerfile recipe that
generates something at container build time, and then a home-made
script that looks for something in a different location. If you're
trying to debug why your containers are all generating the same
uniquifier, then that can be a problem.
Chuck Lever March 3, 2022, 3:53 p.m. UTC | #5
> On Mar 2, 2022, at 10:26 PM, NeilBrown <neilb@suse.de> wrote:
> 
> On Wed, 02 Mar 2022, Chuck Lever III wrote:
>> 
>>> On Feb 28, 2022, at 10:43 PM, NeilBrown <neilb@suse.de> wrote:
>>> 
>>> 
>>> When mounting NFS filesystems in a network namespace using v4, some care
>>> must be taken to ensure a unique and stable client identity.
>>> Add documentation explaining the requirements for container managers.
>>> 
>>> Signed-off-by: NeilBrown <neilb@suse.de>
>>> ---
>>> 
>>> NOTE I originally suggested using uuidgen to generate a uuid from a
>>> container name.  I've changed it to use the name as-is because I cannot
>>> see a justification for using a uuid - though I think that was suggested
>>> somewhere in the discussion.
>>> If someone would like to provide that justification, I'm happy to
>>> include it in the document.
>>> 
>>> Thanks,
>>> NeilBrown
>>> 
>>> 
>>> utils/mount/nfs.man | 63 +++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 63 insertions(+)
>>> 
>>> diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man
>>> index d9f34df36b42..4ab76fb2df91 100644
>>> --- a/utils/mount/nfs.man
>>> +++ b/utils/mount/nfs.man
>>> @@ -1844,6 +1844,69 @@ export pathname, but not both, during a remount.  For example,
>>> merges the mount option
>>> .B ro
>>> with the mount options already saved on disk for the NFS server mounted at /mnt.
>>> +.SH "NFS IN A CONTAINER"
>> 
>> To be clear, this explanation is about the operation of the
>> Linux NFS client in a container environment. The server has
>> different needs that do not appear to be addressed here.
>> The section title should be clear that this information
>> pertains to the client.
> 
> The whole man page is only about the client, but I agree that clarity is
> best.  I've changed the section heading to
> 
>    NFS MOUNTS IN A CONTAINER

Actually I've rethought this.

I think the central point of this text needs to be how
the client uniquifier works. It needs to work this way
for all client deployments, whether containerized or not.

There are some important variations that can be called out:

1. Containers (or virtualized clients)

When multiple NFS clients run on the same physical host.


2. NAT

NAT hasn't been mentioned before, but it is a common
deployment scenario where multiple clients can have the
same hostname and local IP address (a private address such
as 192.168.0.55) but the clients all access the same NFS
server.


3. NFSROOT

Where the uniquifier has to be provided on the boot
command line and can't be persisted locally on the
client.


>>> +When NFS is used to mount filesystems in a container, and specifically
>>> +in a separate network name-space, these mounts are treated as quite
>>> +separate from any mounts in a different container or not in a
>>> +container (i.e. in a different network name-space).
>> 
>> It might be helpful to provide an introductory explanation of
>> how mount works in general in a namespaced environment. There
>> might already be one somewhere. The above text needs to be
>> clear that we are not discussing the mount namespace.
> 
> Mount namespaces are completely irrelevant for this discussion.

Agreed, mount namespaces are irrelevant to this discussion.


> This is "specifically" about "network name-spaces" a I wrote.
> Do I need to say more than that?
> Maybe a sentence "Mount namespaces are not relevant" ??

I would say by way of introduction that "An NFS mount,
unlike a local filesystem mount, exists in both a mount
namespace and a network namespace", then continue with
"this is specifically about network namespaces."


>>> +.P
>>> +In the NFSv4 protocol, each client must have a unique identifier.
>> 
>> ... each client must have a persistent and globally unique
>> identifier.
> 
> I dispute "globally".  The id only needs to be unique among clients of
> a given NFS server.

Practically speaking, that is correct in a limited sense.

However there is no limit on the use of a laptop (ie, a
physically portable client) to access any NFS server that
is local to it. We have no control over how clients are
physically deployed.

A public NFS server is going to see a vast cohort of
clients, all of which need to have unique identifiers.
There's no interaction amongst the clients themselves to
determine whether there are identifier collisions.

Global uniqueness therefore is a requirement to make
that work seamlessly.


> I also dispute "persistent" in the context of "must".
> Unless I'm missing something, a lack of persistence only matters when a
> client stops while still holding state, and then restarts within the
> lease period.  It will then be prevented from establishing conflicting
> state until the lease period ends.

The client's identifier needs to be persistent so that:

1. If the server reboots, it can recognize when clients
   are re-establishing their lock and open state versus
   an unfamiliar creating lock and open state that might
   involve files that an existing client has open.

2. If the client reboots, the server is able to tie the
   rebooted client to an existing lease so that the lease
   and all of the client's previous lock and open state
   are properly purged.

There are moments when a client's identifier can change
without consequences. It's not entirely relevant to the
discussion to go into detail about when those moments
occur.


> So persistence is good, but is not a
> hard requirement.  Uniqueness IS a hard requirement among concurrent
> clients of the one server.

OK, then you were using the colloquial meaning of "must"
and "should", not the RFC 2119 meanings. Capitalizing
them was very confusing. Happily you provided a good
replacement below.


>>> +This is used by the server to determine when a client has restarted,
>>> +allowing any state from a previous instance can be discarded.
>> 
>> Lots of passive voice here :-)
>> 
>> The server associates a lease with the client's identifier
>> and a boot instance verifier. The server attaches all of
>> the client's file open and lock state to that lease, which
>> it preserves until the client's boot verifier changes.
> 
> I guess I"m a passivist.  If we are going for that level of detail we
> need to mention lease expiry too.
> 
> .... it preserves until the lease time passes without any renewal from
>      the client, or the client's boot verifier changes.

This is not entirely true. A server is not required to
dispense with a client's lease state when the lease
period is up. The Linux server does that today, but
soon it won't, instead waiting until a conflicting
open or lock request before it purges the lease of
an unreachable client.

The requirement is actually the converse: the server
must preserve a client's open and lock state during
the lease period. Outside of the lease period,
behavior is an implementation choice.


> In another email you add:
> 
>> Oh and also, this might be a good opportunity to explain
>> how the server requires that the client use not only the
>> same identifier string, but also the same principal to
>> reattach itself to its open and lock state after a server
>> reboot.
>> 
>> This is why the Linux NFS client attempts to use Kerberos
>> whenever it can for this purpose. Using AUTH_SYS invites
>> other another client that happens to have the same identifier
>> to trigger the server to purge that client's open and lock
>> state.
> 
> How relevant is this to the context of a container?

It's relevant because the client's identity consists
of the nfs_client_id4 string and the principal and
authentication flavor used to establish the lease.

If a container is manufactured by duplicating a
template that contains a keytab (and yes, I've seen
this done in practice) the principal and flavor
will be the same in the duplicated container, and
that will be a problem.

If the client is using only AUTH_SYS, as I mention
above, then the only distinction is the nfs_client_id4
string itself (since clients typically use UID 0 as
the principal in this case). There is really no
protection here -- and admins need to be warned
about this because their users will see open and
lock state disappearing for no reason because some
clients happen to choose the same nfs_client_id4 string
and are purging each others' lease.


> How much extra context would be need to add to make the mention of
> credentials coherent?
> Maybe we should add another section about credentials, and add it just
> before this one??

See above. The central discussion needs to be about
client identity IMO.


>>> So any two
>>> +concurrent clients that might access the same server MUST have
>>> +different identifiers, and any two consecutive instances of the same
>>> +client SHOULD have the same identifier.
>> 
>> Capitalized MUST and SHOULD have specific meanings in IETF
>> standards that are probably not obvious to average readers
>> of man pages. To average readers, this looks like shouting.
>> Can you use something a little friendlier?
>> 
> 
> How about:
> 
>   Any two concurrent clients that might access the same server must
>   have different identifiers for correct operation, and any two
>   consecutive instances of the same client should have the same
>   identifier for optimal handling of an unclean restart.

Nice.


>>> +.P
>>> +Linux constructs the identifier (referred to as 
>>> +.B co_ownerid
>>> +in the NFS specifications) from various pieces of information, three of
>>> +which can be controlled by the sysadmin:
>>> +.TP
>>> +Hostname
>>> +The hostname can be different in different containers if they
>>> +have different "UTS" name-spaces.  If the container system ensures
>>> +each container sees a unique host name,
>> 
>> Actually, it turns out that is a pretty big "if". We've
>> found that our cloud customers are not careful about
>> setting unique hostnames. That's exactly why the whole
>> uniquifier thing is so critical!
> 
> :-)  I guess we keep it as "if" though, not "IF" ....

And as mentioned above, it's not possible for them
to select hostnames and IP addresses (in particular
in the private IP address range) that are guaranteed
to be unique enough for a given server. The choices
are completely uncoordinated and have a considerable
risk of collision.


>>> then this is
>>> +sufficient for a correctly functioning NFS identifier.
>>> +The host name is copied when the first NFS filesystem is mounted in
>>> +a given network name-space.  Any subsequent change in the apparent
>>> +hostname will not change the NFSv4 identifier.
>> 
>> The purpose of using a uuid here is that, given its
>> definition in RFC 4122, it has very strong global
>> uniqueness guarantees.
> 
> A uuid generated from a given string (uuidgen -N $name ...) has the same
> uniqueness as the $name.  Turning it into a uuid doesn't improve the
> uniqueness.  It just provides a standard format and obfuscates the
> original.  Neither of those seem necessary here.

If indeed that's what's going on, then that's the
wrong approach. We need to have a globally unique
identifier here. If hashing a hostname has the risk
that the digest will be the same for two clients, then
that version of UUID is not usable for our purpose.

The non-globally unique versions of UUID are hardly
used any more because folks who use UUIDs generally
need a guarantee of global uniqueness without a
central coordinating authority. Time-based and
randomly generated UUIDs are typically the only
style that are used any more.


> I think Ben is considering using /etc/mechine-id.  Creating a uuid from
> that does make it any better.

I assume you mean "does /not/ make it any better".

As long as the machine-id is truly random
and is not, say, a hash of the hostname, then it
should work fine. The only downside of machine-id
is the man page's stipulation that the machine-id
shouldn't be publicly exposed on the network, which
is why it ought be at least hashed before it is used
as part of an nfs_client_id4.

So I guess there's a third requirement, aside from
persistence and global uniqueness: Information about
the sender (client in this case) is not inadvertently
leaked onto the open network.


>> Using a UUID makes hostname uniqueness irrelevant.
> 
> Only if the UUID is created appropriately.  If, for example, it is
> created with -N from some name that is unique on the host, then it needs
> to be combined with the hostname to get sufficient uniqueness.

Then that's the wrong version of UUID to use.


>> Again, I think our goal should be hiding all of this
>> detail from administrators, because once we get this
>> mechanism working correctly, there is absolutely no
>> need for administrators to bother with it.
> 
> Except when things break.  Then admins will appreciate having the
> details so they can track down the breakage.  My desktop didn't boot
> this morning.  Systemd didn't tell me why it was hanging though I
> eventually discovered that it was "wicked.service" that wasn't reporting
> success.  So I'm currently very focused on the need to provide clarity
> to sysadmins, even of "irrelevant" details.
> 
> But this documentation isn't just for sysadmins, it is for container
> developers too, so they can find out how to make their container work
> with NFS.

An alternative location for this detail would be under
Documentation/. A man page is possibly not the right
venue for a detailed explanation of protocol and
implementation; man pages usually are limited to quick
summaries of interfaces.


>> The remaining part of this text probably should be
>> part of the man page for Ben's tool, or whatever is
>> coming next.
> 
> My position is that there is no need for any tool.

Trond's earlier point about having to repeat this
functionality for other ways of mounting NFS
(eg Busybox) suggests we have to have a separate tool,
even though this is only a handful of lines of code.


> The total amount of
> code needed is a couple of lines as presented in the text below.  Why
> provide a wrapper just for that?
> We *cannot* automatically decide how to find a name or where to store a
> generated uuid, so there is no added value that a tool could provide.

I don't think anyone has yet demonstrated (or even
stated) this is impossible. Can you explain why you
believe this?


> We cannot unilaterally fix container systems.  We can only tell people
> who build these systems of the requirements for NFS.


--
Chuck Lever
NeilBrown March 4, 2022, 1:13 a.m. UTC | #6
On Fri, 04 Mar 2022, Trond Myklebust wrote:
> On Thu, 2022-03-03 at 14:26 +1100, NeilBrown wrote:
> > On Wed, 02 Mar 2022, Chuck Lever III wrote:
> > 
> > 
> > > 
> > > 
> > > The remaining part of this text probably should be
> > > part of the man page for Ben's tool, or whatever is
> > > coming next.
> > 
> > My position is that there is no need for any tool.  The total amount
> > of
> > code needed is a couple of lines as presented in the text below.  Why
> > provide a wrapper just for that?
> > We *cannot* automatically decide how to find a name or where to store
> > a
> > generated uuid, so there is no added value that a tool could provide.
> > 
> > We cannot unilaterally fix container systems.  We can only tell
> > people
> > who build these systems of the requirements for NFS.
> > 
> 
> I disagree with this position. The value of having a standard tool is
> that it also creates a standard for how and where the uniquifier is
> generated and persisted.
> 
> Otherwise you have to deal with the fact that you may have a systemd
> script that persists something in one file, a Dockerfile recipe that
> generates something at container build time, and then a home-made
> script that looks for something in a different location. If you're
> trying to debug why your containers are all generating the same
> uniquifier, then that can be a problem.

I don't see how a tool can provide any consistency.
Is there some standard that say how containers should be built, and
where tools can store persistent data?  If not, the tool needs to be
configured, and that is not importantly different from bash being
configured with a 1-line script to write out the identifier.

I'm not strongly against a tools, I just can't see the benefit.

Thanks,
NeilBrown
Steve Dickson March 4, 2022, 3:54 p.m. UTC | #7
Hey!

On 3/3/22 8:13 PM, NeilBrown wrote:
> On Fri, 04 Mar 2022, Trond Myklebust wrote:
>> On Thu, 2022-03-03 at 14:26 +1100, NeilBrown wrote:
>>> On Wed, 02 Mar 2022, Chuck Lever III wrote:
>>>
>>>
>>>>
>>>>
>>>> The remaining part of this text probably should be
>>>> part of the man page for Ben's tool, or whatever is
>>>> coming next.
>>>
>>> My position is that there is no need for any tool.  The total amount
>>> of
>>> code needed is a couple of lines as presented in the text below.  Why
>>> provide a wrapper just for that?
>>> We *cannot* automatically decide how to find a name or where to store
>>> a
>>> generated uuid, so there is no added value that a tool could provide.
>>>
>>> We cannot unilaterally fix container systems.  We can only tell
>>> people
>>> who build these systems of the requirements for NFS.
>>>
>>
>> I disagree with this position. The value of having a standard tool is
>> that it also creates a standard for how and where the uniquifier is
>> generated and persisted.
>>
>> Otherwise you have to deal with the fact that you may have a systemd
>> script that persists something in one file, a Dockerfile recipe that
>> generates something at container build time, and then a home-made
>> script that looks for something in a different location. If you're
>> trying to debug why your containers are all generating the same
>> uniquifier, then that can be a problem.
> 
> I don't see how a tool can provide any consistency.
> Is there some standard that say how containers should be built, and
> where tools can store persistent data?  If not, the tool needs to be
> configured, and that is not importantly different from bash being
> configured with a 1-line script to write out the identifier.
> 
> I'm not strongly against a tools, I just can't see the benefit.
I think I agree with this... Thinking about it... having a command that
tries to manipulate different containers in different ways just
seems like a recipe for disaster... I just don't see how a command would
ever get it right... Hell we can't agree on its command's name
much less what it will do. :-)

So I like idea of documenting when needs to happen in the
different types of containers... So I think the man page
is the way to go... and I think it is the safest way to go.

Chuck, if you would like tweak the verbiage... by all means.

Neil, will be a V2 for man page patch from this discussion
or should I just take the one you posted? If you do post
a V2, please start a new thread.

steved.
Chuck Lever March 4, 2022, 4:15 p.m. UTC | #8
> On Mar 4, 2022, at 10:54 AM, Steve Dickson <steved@redhat.com> wrote:
> 
> Hey!
> 
> On 3/3/22 8:13 PM, NeilBrown wrote:
>> On Fri, 04 Mar 2022, Trond Myklebust wrote:
>>> On Thu, 2022-03-03 at 14:26 +1100, NeilBrown wrote:
>>>> On Wed, 02 Mar 2022, Chuck Lever III wrote:
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> The remaining part of this text probably should be
>>>>> part of the man page for Ben's tool, or whatever is
>>>>> coming next.
>>>> 
>>>> My position is that there is no need for any tool.  The total amount
>>>> of
>>>> code needed is a couple of lines as presented in the text below.  Why
>>>> provide a wrapper just for that?
>>>> We *cannot* automatically decide how to find a name or where to store
>>>> a
>>>> generated uuid, so there is no added value that a tool could provide.
>>>> 
>>>> We cannot unilaterally fix container systems.  We can only tell
>>>> people
>>>> who build these systems of the requirements for NFS.
>>>> 
>>> 
>>> I disagree with this position. The value of having a standard tool is
>>> that it also creates a standard for how and where the uniquifier is
>>> generated and persisted.
>>> 
>>> Otherwise you have to deal with the fact that you may have a systemd
>>> script that persists something in one file, a Dockerfile recipe that
>>> generates something at container build time, and then a home-made
>>> script that looks for something in a different location. If you're
>>> trying to debug why your containers are all generating the same
>>> uniquifier, then that can be a problem.
>> I don't see how a tool can provide any consistency.

It seems to me that having a tool with its own man page directed
towards Linux distributors would be the central place for this
kind of configuration and implementation. Otherwise, we will have
to ensure this is done correctly for each implementation of
mount.


>> Is there some standard that say how containers should be built, and
>> where tools can store persistent data?  If not, the tool needs to be
>> configured, and that is not importantly different from bash being
>> configured with a 1-line script to write out the identifier.

IMO six of one, half dozen of another. I don't see this being
any more or less safe than changing each implementation of mount
to deal with an NFS-specific setting.


>> I'm not strongly against a tools, I just can't see the benefit.
> I think I agree with this... Thinking about it... having a command that
> tries to manipulate different containers in different ways just
> seems like a recipe for disaster... I just don't see how a command would
> ever get it right... Hell we can't agree on its command's name
> much less what it will do. :-)

To be clear what you are advocating, each implementation of mount.nfs,
including the ones that are not shipped with nfs-utils (like Busybox
and initramfs) will need to provide a mechanism for setting the client
uniquifier. Just to confirm that is what is behind door number one.

Since it is just a line or two of code, it might be of little
harm just to go with separate implementations for now and stop
talking about it. If it sucks, we can fix the suckage.

Who volunteers to implement this mechanism in mount.nfs ?


> So I like idea of documenting when needs to happen in the
> different types of containers... So I think the man page
> is the way to go... and I think it is the safest way to go.
> 
> Chuck, if you would like tweak the verbiage... by all means.

I stand ready.


> Neil, will be a V2 for man page patch from this discussion
> or should I just take the one you posted? If you do post
> a V2, please start a new thread.
> 
> steved.

--
Chuck Lever
Steve Dickson March 4, 2022, 4:54 p.m. UTC | #9
On 3/4/22 11:15 AM, Chuck Lever III wrote:
> 
> 
>> On Mar 4, 2022, at 10:54 AM, Steve Dickson <steved@redhat.com> wrote:
>>
>> Hey!
>>
>> On 3/3/22 8:13 PM, NeilBrown wrote:
>>> On Fri, 04 Mar 2022, Trond Myklebust wrote:
>>>> On Thu, 2022-03-03 at 14:26 +1100, NeilBrown wrote:
>>>>> On Wed, 02 Mar 2022, Chuck Lever III wrote:
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> The remaining part of this text probably should be
>>>>>> part of the man page for Ben's tool, or whatever is
>>>>>> coming next.
>>>>>
>>>>> My position is that there is no need for any tool.  The total amount
>>>>> of
>>>>> code needed is a couple of lines as presented in the text below.  Why
>>>>> provide a wrapper just for that?
>>>>> We *cannot* automatically decide how to find a name or where to store
>>>>> a
>>>>> generated uuid, so there is no added value that a tool could provide.
>>>>>
>>>>> We cannot unilaterally fix container systems.  We can only tell
>>>>> people
>>>>> who build these systems of the requirements for NFS.
>>>>>
>>>>
>>>> I disagree with this position. The value of having a standard tool is
>>>> that it also creates a standard for how and where the uniquifier is
>>>> generated and persisted.
>>>>
>>>> Otherwise you have to deal with the fact that you may have a systemd
>>>> script that persists something in one file, a Dockerfile recipe that
>>>> generates something at container build time, and then a home-made
>>>> script that looks for something in a different location. If you're
>>>> trying to debug why your containers are all generating the same
>>>> uniquifier, then that can be a problem.
>>> I don't see how a tool can provide any consistency.
> 
> It seems to me that having a tool with its own man page directed
> towards Linux distributors would be the central place for this
> kind of configuration and implementation. Otherwise, we will have
> to ensure this is done correctly for each implementation of
> mount.
> 
> 
>>> Is there some standard that say how containers should be built, and
>>> where tools can store persistent data?  If not, the tool needs to be
>>> configured, and that is not importantly different from bash being
>>> configured with a 1-line script to write out the identifier.
> 
> IMO six of one, half dozen of another. I don't see this being
> any more or less safe than changing each implementation of mount
> to deal with an NFS-specific setting.
> 
> 
>>> I'm not strongly against a tools, I just can't see the benefit.
>> I think I agree with this... Thinking about it... having a command that
>> tries to manipulate different containers in different ways just
>> seems like a recipe for disaster... I just don't see how a command would
>> ever get it right... Hell we can't agree on its command's name
>> much less what it will do. :-)
> 
> To be clear what you are advocating, each implementation of mount.nfs,
> including the ones that are not shipped with nfs-utils (like Busybox
> and initramfs) will need to provide a mechanism for setting the client
> uniquifier. Just to confirm that is what is behind door number one.
Well I can't speak for mount.nfs that are not in nfs-utils.
I'm assuming they are going to do... what are going to do...
regardless of what we do. At least we will give them a
guideline of what needs to be done.


> 
> Since it is just a line or two of code, it might be of little
> harm just to go with separate implementations for now and stop
> talking about it. If it sucks, we can fix the suckage.
Right I see documenting what needs to happen is the
first step. Heck don't we even know how accurate that
documentation is... yet! Once we vet the doc to make
sure it is accurate... Maybe then we could come up
with a auto-configuration solution that has been
proposed.

> 
> Who volunteers to implement this mechanism in mount.nfs ?Well, there has been 3 implementations shot down
which tells me, as a community, we need to get
more of an idea of what needs to happen and how.
That's why I think Neil's man page additions
is a good start.

> 
> 
>> So I like idea of documenting when needs to happen in the
>> different types of containers... So I think the man page
>> is the way to go... and I think it is the safest way to go.
>>
>> Chuck, if you would like tweak the verbiage... by all means.
> 
> I stand ready.
That I have confidence in. :-) Thank you!

steved.
> 
> 
>> Neil, will be a V2 for man page patch from this discussion
>> or should I just take the one you posted? If you do post
>> a V2, please start a new thread.
>>
>> steved.
> 
> --
> Chuck Lever
> 
> 
>
NeilBrown March 5, 2022, 1:15 a.m. UTC | #10
On Sat, 05 Mar 2022, Steve Dickson wrote:
> Hey!
> 
> On 3/3/22 8:13 PM, NeilBrown wrote:
> > On Fri, 04 Mar 2022, Trond Myklebust wrote:
> >> On Thu, 2022-03-03 at 14:26 +1100, NeilBrown wrote:
> >>> On Wed, 02 Mar 2022, Chuck Lever III wrote:
> >>>
> >>>
> >>>>
> >>>>
> >>>> The remaining part of this text probably should be
> >>>> part of the man page for Ben's tool, or whatever is
> >>>> coming next.
> >>>
> >>> My position is that there is no need for any tool.  The total amount
> >>> of
> >>> code needed is a couple of lines as presented in the text below.  Why
> >>> provide a wrapper just for that?
> >>> We *cannot* automatically decide how to find a name or where to store
> >>> a
> >>> generated uuid, so there is no added value that a tool could provide.
> >>>
> >>> We cannot unilaterally fix container systems.  We can only tell
> >>> people
> >>> who build these systems of the requirements for NFS.
> >>>
> >>
> >> I disagree with this position. The value of having a standard tool is
> >> that it also creates a standard for how and where the uniquifier is
> >> generated and persisted.
> >>
> >> Otherwise you have to deal with the fact that you may have a systemd
> >> script that persists something in one file, a Dockerfile recipe that
> >> generates something at container build time, and then a home-made
> >> script that looks for something in a different location. If you're
> >> trying to debug why your containers are all generating the same
> >> uniquifier, then that can be a problem.
> > 
> > I don't see how a tool can provide any consistency.
> > Is there some standard that say how containers should be built, and
> > where tools can store persistent data?  If not, the tool needs to be
> > configured, and that is not importantly different from bash being
> > configured with a 1-line script to write out the identifier.
> > 
> > I'm not strongly against a tools, I just can't see the benefit.
> I think I agree with this... Thinking about it... having a command that
> tries to manipulate different containers in different ways just
> seems like a recipe for disaster... I just don't see how a command would
> ever get it right... Hell we can't agree on its command's name
> much less what it will do. :-)
> 
> So I like idea of documenting when needs to happen in the
> different types of containers... So I think the man page
> is the way to go... and I think it is the safest way to go.
> 
> Chuck, if you would like tweak the verbiage... by all means.
> 
> Neil, will be a V2 for man page patch from this discussion
> or should I just take the one you posted? If you do post
> a V2, please start a new thread.

I'll post a V2.  Chuck made some excellent structural suggestions.

Thanks,
NeilBrown
NeilBrown March 8, 2022, 12:44 a.m. UTC | #11
On Fri, 04 Mar 2022, Chuck Lever III wrote:
> 
> 2. NAT
> 
> NAT hasn't been mentioned before, but it is a common
> deployment scenario where multiple clients can have the
> same hostname and local IP address (a private address such
> as 192.168.0.55) but the clients all access the same NFS
> server.

I can't see how NAT is relevant.  Whether or not clients have the same
host name seems to be independent of whether or not they access the
server through NAT.
What am I missing?
> 
> The client's identifier needs to be persistent so that:
> 
> 1. If the server reboots, it can recognize when clients
>    are re-establishing their lock and open state versus
>    an unfamiliar creating lock and open state that might
>    involve files that an existing client has open.

The protocol request clients which are re-establishing state to
explicitly say "I am re-establishing state" (e.g. CLAIM_PREVIOUS).
clients which are creating new state don't make that claim.

IF the server maintainer persistent state, then the reboot server needs
to use the client identifier to find the persistent state, but that is
not importantly different from the more common situation of a server
which hasn't rebooted and needs to find the appropriate state.

Again - what am I missing?


Thanks,
NeilBrown
Chuck Lever March 9, 2022, 4:28 p.m. UTC | #12
> On Mar 7, 2022, at 7:44 PM, NeilBrown <neilb@suse.de> wrote:
> 
> On Fri, 04 Mar 2022, Chuck Lever III wrote:
>> 
>> 2. NAT
>> 
>> NAT hasn't been mentioned before, but it is a common
>> deployment scenario where multiple clients can have the
>> same hostname and local IP address (a private address such
>> as 192.168.0.55) but the clients all access the same NFS
>> server.
> 
> I can't see how NAT is relevant.  Whether or not clients have the same
> host name seems to be independent of whether or not they access the
> server through NAT.
> What am I missing?

The usual construction of Linux's nfs_client_id4 includes
the hostname and client IP address. If two clients behind
two independent NAT boxes happen to use the same private
IP address and the same hostname (for example
"localhost.localdomain" is a common misconfiguration) then
both of these clients present the same nfs_client_id4
string to the NFS server.

Hilarity ensues.


>> The client's identifier needs to be persistent so that:
>> 
>> 1. If the server reboots, it can recognize when clients
>>   are re-establishing their lock and open state versus
>>   an unfamiliar creating lock and open state that might
>>   involve files that an existing client has open.
> 
> The protocol request clients which are re-establishing state to
> explicitly say "I am re-establishing state" (e.g. CLAIM_PREVIOUS).
> clients which are creating new state don't make that claim.
> 
> IF the server maintainer persistent state, then the reboot server needs
> to use the client identifier to find the persistent state, but that is
> not importantly different from the more common situation of a server
> which hasn't rebooted and needs to find the appropriate state.
> 
> Again - what am I missing?

The server records each client's nfs_client_id4 and its
boot verifier.

It's my understanding that the server is required to reject
CLAIM_PREVIOUS opens if it does not recognize either the
nfs_client_id4 string or its boot verifier, since that
means that the client had no previous state during the last
most recent server epoch.


--
Chuck Lever
NeilBrown March 10, 2022, 12:37 a.m. UTC | #13
On Thu, 10 Mar 2022, Chuck Lever III wrote:
> 
> > On Mar 7, 2022, at 7:44 PM, NeilBrown <neilb@suse.de> wrote:
> > 
> > On Fri, 04 Mar 2022, Chuck Lever III wrote:
> >> 
> >> 2. NAT
> >> 
> >> NAT hasn't been mentioned before, but it is a common
> >> deployment scenario where multiple clients can have the
> >> same hostname and local IP address (a private address such
> >> as 192.168.0.55) but the clients all access the same NFS
> >> server.
> > 
> > I can't see how NAT is relevant.  Whether or not clients have the same
> > host name seems to be independent of whether or not they access the
> > server through NAT.
> > What am I missing?
> 
> The usual construction of Linux's nfs_client_id4 includes
> the hostname and client IP address. If two clients behind
> two independent NAT boxes happen to use the same private
> IP address and the same hostname (for example
> "localhost.localdomain" is a common misconfiguration) then
> both of these clients present the same nfs_client_id4
> string to the NFS server.
> 
> Hilarity ensues.

This would only apply to NFSv4.0 (and without migration enabled).
NFSv4.1 and later don't include the IP address in the client identity.

So I think the scenario you describe is primarily a problem of the
hostname being misconfigured.  In NFSv4.0 the normal variety of IP
address can hide that problem.  IF NAT Is used in such a way that two
clients are configured with the same IP address, the defeats the hiding.

I don't think the extra complexity of NAT really makes this more
interesting.   The problem is uniform hostnames, and the fix is the same
for any other case of uniform host names.

> 
> 
> >> The client's identifier needs to be persistent so that:
> >> 
> >> 1. If the server reboots, it can recognize when clients
> >>   are re-establishing their lock and open state versus
> >>   an unfamiliar creating lock and open state that might
> >>   involve files that an existing client has open.
> > 
> > The protocol request clients which are re-establishing state to
> > explicitly say "I am re-establishing state" (e.g. CLAIM_PREVIOUS).
> > clients which are creating new state don't make that claim.
> > 
> > IF the server maintainer persistent state, then the reboot server needs
> > to use the client identifier to find the persistent state, but that is
> > not importantly different from the more common situation of a server
> > which hasn't rebooted and needs to find the appropriate state.
> > 
> > Again - what am I missing?
> 
> The server records each client's nfs_client_id4 and its
> boot verifier.
> 
> It's my understanding that the server is required to reject
> CLAIM_PREVIOUS opens if it does not recognize either the
> nfs_client_id4 string or its boot verifier, since that
> means that the client had no previous state during the last
> most recent server epoch.

I think we are saying the same thing with different words.
When you wrote

    If the server reboots, it can recognize when clients
    are re-establishing their lock and open state 

I think that "validate" is more relevant than "recognize".  The server
knows from the request that an attempt is being made to reestablish
state.  The client identity, credential, and boot verifier are used
to validate that request.

But essentially we are on the same page here.

Thanks,
NeilBrown
diff mbox series

Patch

diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man
index d9f34df36b42..4ab76fb2df91 100644
--- a/utils/mount/nfs.man
+++ b/utils/mount/nfs.man
@@ -1844,6 +1844,69 @@  export pathname, but not both, during a remount.  For example,
 merges the mount option
 .B ro
 with the mount options already saved on disk for the NFS server mounted at /mnt.
+.SH "NFS IN A CONTAINER"
+When NFS is used to mount filesystems in a container, and specifically
+in a separate network name-space, these mounts are treated as quite
+separate from any mounts in a different container or not in a
+container (i.e. in a different network name-space).
+.P
+In the NFSv4 protocol, each client must have a unique identifier.
+This is used by the server to determine when a client has restarted,
+allowing any state from a previous instance can be discarded.  So any two
+concurrent clients that might access the same server MUST have
+different identifiers, and any two consecutive instances of the same
+client SHOULD have the same identifier.
+.P
+Linux constructs the identifier (referred to as 
+.B co_ownerid
+in the NFS specifications) from various pieces of information, three of
+which can be controlled by the sysadmin:
+.TP
+Hostname
+The hostname can be different in different containers if they
+have different "UTS" name-spaces.  If the container system ensures
+each container sees a unique host name, then this is
+sufficient for a correctly functioning NFS identifier.
+The host name is copied when the first NFS filesystem is mounted in
+a given network name-space.  Any subsequent change in the apparent
+hostname will not change the NFSv4 identifier.
+.TP
+.B nfs.nfs4_unique_id
+This module parameter is the same for all containers on a given host
+so it is not useful to differentiate between containers.
+.TP
+.B /sys/fs/nfs/client/net/identifier
+This virtual file (available since Linux 5.3) is local to the network
+name-space in which it is accessed and so can provided uniqueness between
+containers when the hostname is uniform among containers.
+.RS
+.PP
+This value is empty on name-space creation.
+If the value is to be set, that should be done before the first
+mount (much as the hostname is copied before the first mount).
+If the container system has access to some sort of per-container
+identity, then a command like
+.RS 4
+echo "$CONTAINER_IDENTITY" \\
+.br
+   > /sys/fs/nfs/client/net/identifier 
+.RE
+might be suitable.  If the container system provides no stable name,
+but does have stable storage, then something like
+.RS 4
+[ -s /etc/nfsv4-uuid ] || uuidgen > /etc/nfsv4-uuid && 
+.br
+cat /etc/nfsv4-uuid > /sys/fs/nfs/client/net/identifier 
+.RE
+would suffice.
+.PP
+If a container has neither a stable name nor stable (local) storage,
+then it is not possible to provide a stable identifier, so providing
+a random one to ensure uniqueness would be best
+.RS 4
+uuidgen > /sys/fs/nfs/client/net/identifier
+.RE
+.RE
 .SH FILES
 .TP 1.5i
 .I /etc/fstab