diff mbox series

[v5,19/19] nfs: add Documentation/filesystems/nfs/localio.rst

Message ID 20240618201949.81977-20-snitzer@kernel.org (mailing list archive)
State New
Headers show
Series nfs/nfsd: add support for localio | expand

Commit Message

Mike Snitzer June 18, 2024, 8:19 p.m. UTC
This document gives an overview of the LOCALIO protocol extension
added to the Linux NFS client and server (both v3 and v4) to allow a
client and server to reliably handshake to determine if they are on
the same host.  The LOCALIO protocol extension follows the well-worn
pattern established by the ACL protocol extension.

The robust handshake between local client and server is just the
beginning, the ultimate use-case this locality makes possible is the
client is able to issue reads, writes and commits directly to the
server without having to go over the network.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 Documentation/filesystems/nfs/localio.rst | 101 ++++++++++++++++++++++
 include/linux/nfslocalio.h                |   2 +
 2 files changed, 103 insertions(+)
 create mode 100644 Documentation/filesystems/nfs/localio.rst

Comments

Chuck Lever III June 18, 2024, 9:46 p.m. UTC | #1
On Tue, Jun 18, 2024 at 04:19:49PM -0400, Mike Snitzer wrote:
> This document gives an overview of the LOCALIO protocol extension
> added to the Linux NFS client and server (both v3 and v4) to allow a
> client and server to reliably handshake to determine if they are on
> the same host.  The LOCALIO protocol extension follows the well-worn
> pattern established by the ACL protocol extension.
> 
> The robust handshake between local client and server is just the
> beginning, the ultimate use-case this locality makes possible is the
> client is able to issue reads, writes and commits directly to the
> server without having to go over the network.
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  Documentation/filesystems/nfs/localio.rst | 101 ++++++++++++++++++++++
>  include/linux/nfslocalio.h                |   2 +
>  2 files changed, 103 insertions(+)
>  create mode 100644 Documentation/filesystems/nfs/localio.rst
> 
> diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
> new file mode 100644
> index 000000000000..4b4595037a7f
> --- /dev/null
> +++ b/Documentation/filesystems/nfs/localio.rst
> @@ -0,0 +1,101 @@
> +===========
> +NFS localio
> +===========
> +
> +This document gives an overview of the LOCALIO protocol extension added
> +to the Linux NFS client and server (both v3 and v4) to allow a client
> +and server to reliably handshake to determine if they are on the same
> +host.  The LOCALIO protocol extension follows the well-worn pattern
> +established by the ACL protocol extension.
> +
> +The LOCALIO protocol extension is needed to allow robust discovery of
> +clients local to their servers.  Prior to this extension a fragile
> +sockaddr network address based match against all local network
> +interfaces was attempted.  But unlike the LOCALIO protocol extension,
> +the sockaddr-based matching didn't handle use of iptables or containers.
> +
> +The robust handshake between local client and server is just the
> +beginning, the ultimate use-case this locality makes possible is the
> +client is able to issue reads, writes and commits directly to the server
> +without having to go over the network.  This is particularly useful for
> +container usecases (e.g. kubernetes) where it is possible to run an IO
> +job local to the server.
> +
> +The performance advantage realized from localio's ability to bypass
> +using XDR and RPC for reads, writes and commits can be extreme, e.g.:
> +fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8,
> +-  With localio:
> +  read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec)
> +-  Without localio:
> +  read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec)
> +
> +RPC
> +---
> +
> +The LOCALIO RPC protocol consists of a single "GETUUID" RPC that allows
> +the client to retrieve a server's uuid.  LOCALIOPROC_GETUUID encodes the
> +server's uuid_t in terms of the fixed UUID_SIZE (16 bytes).  The fixed
> +size opaque encode and decode XDR methods are used instead of the less
> +efficient variable sized methods.

I'm reading between the lines ("well-worn pattern established by
the [NFS]ACL protocol"). I'm guessing that the client and server
will exchange this protocol on the same connection as NFS traffic?

The use of the term "extension" in this Document might be atypical.
An /extension/ means that the base RPC program (NFS in this case)
is somehow modified. However, if LOCALIO is a distinct RPC program
then this isn't an extension of the NFS protocol, per se.

A protocol spec needs to include:

o The RPC program and version number

o A description of each its procedures, along with an XDR definition
  of its arguments and results

o Any related constants or bit mask values

And any details about a fixed destination port, or that
implementations should expect this RPC program to appear on the same
connection or transport as some other RPC program.

If this is a real extension of the NFS protocol, then I think the
usual rules apply of requiring standards action before we can merge
a Linux implementation of the extension. But I don't think that's
what you're doing...? That needs to be made more clear.


> +
> +NFS Common and Server
> +---------------------
> +
> +First use is in nfsd, to add access to a global nfsd_uuids list in
> +nfs_common that is used to register and then identify local nfsd
> +instances.
> +
> +nfsd_uuids is protected by the nfsd_mutex or RCU read lock and is
> +composed of nfsd_uuid_t instances that are managed as nfsd creates them
> +(per network namespace).
> +
> +nfsd_uuid_is_local() and nfsd_uuid_lookup() are used to search all local
> +nfsd for the client specified nfsd uuid.
> +
> +The nfsd_uuids list is the basis for localio enablement, as such it has
> +members that point to nfsd memory for direct use by the client
> +(e.g. 'net' is the server's network namespace, through it the client can
> +access nn->nfsd_serv with proper rcu read access).  It is this client
> +and server synchronization that enables advanced usage and lifetime of
> +objects to span from the host kernel's nfsd to per-container knfsd
> +instances that are connected to nfs client's running on the same local
> +host.
> +
> +NFS Client
> +----------
> +
> +fs/nfs/localio.c:nfs_local_probe() will retrieve a server's uuid via
> +LOCALIO protocol and check if the server with that uuid is known to be
> +local.  This ensures client and server 1: support localio 2: are local
> +to each other.
> +
> +See fs/nfs/localio.c:nfs_local_open_fh() and
> +fs/nfsd/localio.c:nfsd_open_local_fh() for the interface that makes
> +focused use of nfsd_uuid_t struct to allow a client local to a server to
> +open a file pointer without needing to go over the network.
> +
> +The client's fs/nfs/localio.c:nfs_local_open_fh() will call into the
> +server's fs/nfsd/localio.c:nfsd_open_local_fh() and carefully access
> +both the nfsd network namespace and the associated nn->nfsd_serv in
> +terms of RCU.  If nfsd_open_local_fh() finds that client no longer sees
> +valid nfsd objects (be it struct net or nn->nfsd_serv) it return ENXIO
> +to nfs_local_open_fh() and the client will try to reestablish the
> +LOCALIO resources needed by calling nfs_local_probe() again.  This
> +recovery is needed if/when an nfsd instance running in a container were
> +to reboot while a localio client is connected to it.
> +
> +Testing
> +-------
> +
> +The LOCALIO protocol extension and associated NFS localio read, right
> +and commit access have proven stable against various test scenarios:
> +
> +-  Client and server both on localhost (for both v3 and v4.2).
> +
> +-  Various permutations of client and server support enablement for
> +   both local and remote client and server.  Testing against NFS storage
> +   products that don't support the LOCALIO protocol was also performed.
> +
> +-  Client on host, server within a container (for both v3 and v4.2)
> +   The container testing was in terms of podman managed containers and
> +   includes container stop/restart scenario.

This isn't what I meant by a section on testing.

I meant "How would I go about testing this myself? What tests are
publicly available or part of existing NFS test suites we commonly
use?"

So, this Documention needs a recipe for setting up a client/server
with LOCALIO and some details about how it can be tested.

What you wrote is appropriate for the series cover letter.


> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index c9592ad0afe2..a9722e18b527 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -20,6 +20,8 @@ extern struct list_head nfsd_uuids;
>   * Each nfsd instance has an nfsd_uuid_t that is accessible through the
>   * global nfsd_uuids list. Useful to allow a client to negotiate if localio
>   * possible with its server.
> + *
> + * See Documentation/filesystems/nfs/localio.rst for more detail.
>   */
>  typedef struct {
>  	uuid_t uuid;
> -- 
> 2.44.0
>
NeilBrown June 19, 2024, 5:47 a.m. UTC | #2
On Wed, 19 Jun 2024, Chuck Lever wrote:
> On Tue, Jun 18, 2024 at 04:19:49PM -0400, Mike Snitzer wrote:
> > This document gives an overview of the LOCALIO protocol extension
> > added to the Linux NFS client and server (both v3 and v4) to allow a
> > client and server to reliably handshake to determine if they are on
> > the same host.  The LOCALIO protocol extension follows the well-worn
> > pattern established by the ACL protocol extension.
> > 
> > The robust handshake between local client and server is just the
> > beginning, the ultimate use-case this locality makes possible is the
> > client is able to issue reads, writes and commits directly to the
> > server without having to go over the network.
> > 
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> >  Documentation/filesystems/nfs/localio.rst | 101 ++++++++++++++++++++++
> >  include/linux/nfslocalio.h                |   2 +
> >  2 files changed, 103 insertions(+)
> >  create mode 100644 Documentation/filesystems/nfs/localio.rst
> > 
> > diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
> > new file mode 100644
> > index 000000000000..4b4595037a7f
> > --- /dev/null
> > +++ b/Documentation/filesystems/nfs/localio.rst
> > @@ -0,0 +1,101 @@
> > +===========
> > +NFS localio
> > +===========
> > +
> > +This document gives an overview of the LOCALIO protocol extension added
> > +to the Linux NFS client and server (both v3 and v4) to allow a client
> > +and server to reliably handshake to determine if they are on the same
> > +host.  The LOCALIO protocol extension follows the well-worn pattern
> > +established by the ACL protocol extension.
> > +
> > +The LOCALIO protocol extension is needed to allow robust discovery of
> > +clients local to their servers.  Prior to this extension a fragile
> > +sockaddr network address based match against all local network
> > +interfaces was attempted.  But unlike the LOCALIO protocol extension,
> > +the sockaddr-based matching didn't handle use of iptables or containers.
> > +
> > +The robust handshake between local client and server is just the
> > +beginning, the ultimate use-case this locality makes possible is the
> > +client is able to issue reads, writes and commits directly to the server
> > +without having to go over the network.  This is particularly useful for
> > +container usecases (e.g. kubernetes) where it is possible to run an IO
> > +job local to the server.
> > +
> > +The performance advantage realized from localio's ability to bypass
> > +using XDR and RPC for reads, writes and commits can be extreme, e.g.:
> > +fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8,
> > +-  With localio:
> > +  read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec)
> > +-  Without localio:
> > +  read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec)
> > +
> > +RPC
> > +---
> > +
> > +The LOCALIO RPC protocol consists of a single "GETUUID" RPC that allows
> > +the client to retrieve a server's uuid.  LOCALIOPROC_GETUUID encodes the
> > +server's uuid_t in terms of the fixed UUID_SIZE (16 bytes).  The fixed
> > +size opaque encode and decode XDR methods are used instead of the less
> > +efficient variable sized methods.
> 
> I'm reading between the lines ("well-worn pattern established by
> the [NFS]ACL protocol"). I'm guessing that the client and server
> will exchange this protocol on the same connection as NFS traffic?
> 
> The use of the term "extension" in this Document might be atypical.
> An /extension/ means that the base RPC program (NFS in this case)
> is somehow modified. However, if LOCALIO is a distinct RPC program
> then this isn't an extension of the NFS protocol, per se.
> 
> A protocol spec needs to include:
> 
> o The RPC program and version number
> 
> o A description of each its procedures, along with an XDR definition
>   of its arguments and results
> 
> o Any related constants or bit mask values

Note that providing this information in the format of a ".x" file as
understood by rpcgen is a good approach.

It isn't clear to me why you implement both v3 and v4 of the LOCALIO
program.  I don't see how they relate to the NFS protocol version.  Just
implement v1 which simply returns the UUID.

Thanks,
NeilBrown
Mike Snitzer June 19, 2024, 6:27 p.m. UTC | #3
On Wed, Jun 19, 2024 at 03:47:05PM +1000, NeilBrown wrote:
> On Wed, 19 Jun 2024, Chuck Lever wrote:
> > On Tue, Jun 18, 2024 at 04:19:49PM -0400, Mike Snitzer wrote:
> > > This document gives an overview of the LOCALIO protocol extension
> > > added to the Linux NFS client and server (both v3 and v4) to allow a
> > > client and server to reliably handshake to determine if they are on
> > > the same host.  The LOCALIO protocol extension follows the well-worn
> > > pattern established by the ACL protocol extension.
> > > 
> > > The robust handshake between local client and server is just the
> > > beginning, the ultimate use-case this locality makes possible is the
> > > client is able to issue reads, writes and commits directly to the
> > > server without having to go over the network.
> > > 
> > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > ---
> > >  Documentation/filesystems/nfs/localio.rst | 101 ++++++++++++++++++++++
> > >  include/linux/nfslocalio.h                |   2 +
> > >  2 files changed, 103 insertions(+)
> > >  create mode 100644 Documentation/filesystems/nfs/localio.rst
> > > 
> > > diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
> > > new file mode 100644
> > > index 000000000000..4b4595037a7f
> > > --- /dev/null
> > > +++ b/Documentation/filesystems/nfs/localio.rst
> > > @@ -0,0 +1,101 @@
> > > +===========
> > > +NFS localio
> > > +===========
> > > +
> > > +This document gives an overview of the LOCALIO protocol extension added
> > > +to the Linux NFS client and server (both v3 and v4) to allow a client
> > > +and server to reliably handshake to determine if they are on the same
> > > +host.  The LOCALIO protocol extension follows the well-worn pattern
> > > +established by the ACL protocol extension.
> > > +
> > > +The LOCALIO protocol extension is needed to allow robust discovery of
> > > +clients local to their servers.  Prior to this extension a fragile
> > > +sockaddr network address based match against all local network
> > > +interfaces was attempted.  But unlike the LOCALIO protocol extension,
> > > +the sockaddr-based matching didn't handle use of iptables or containers.
> > > +
> > > +The robust handshake between local client and server is just the
> > > +beginning, the ultimate use-case this locality makes possible is the
> > > +client is able to issue reads, writes and commits directly to the server
> > > +without having to go over the network.  This is particularly useful for
> > > +container usecases (e.g. kubernetes) where it is possible to run an IO
> > > +job local to the server.
> > > +
> > > +The performance advantage realized from localio's ability to bypass
> > > +using XDR and RPC for reads, writes and commits can be extreme, e.g.:
> > > +fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8,
> > > +-  With localio:
> > > +  read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec)
> > > +-  Without localio:
> > > +  read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec)
> > > +
> > > +RPC
> > > +---
> > > +
> > > +The LOCALIO RPC protocol consists of a single "GETUUID" RPC that allows
> > > +the client to retrieve a server's uuid.  LOCALIOPROC_GETUUID encodes the
> > > +server's uuid_t in terms of the fixed UUID_SIZE (16 bytes).  The fixed
> > > +size opaque encode and decode XDR methods are used instead of the less
> > > +efficient variable sized methods.
> > 
> > I'm reading between the lines ("well-worn pattern established by
> > the [NFS]ACL protocol"). I'm guessing that the client and server
> > will exchange this protocol on the same connection as NFS traffic?
> > 
> > The use of the term "extension" in this Document might be atypical.
> > An /extension/ means that the base RPC program (NFS in this case)
> > is somehow modified. However, if LOCALIO is a distinct RPC program
> > then this isn't an extension of the NFS protocol, per se.
> > 
> > A protocol spec needs to include:
> > 
> > o The RPC program and version number
> > 
> > o A description of each its procedures, along with an XDR definition
> >   of its arguments and results
> > 
> > o Any related constants or bit mask values
> 
> Note that providing this information in the format of a ".x" file as
> understood by rpcgen is a good approach.

I've approximated that in an update for v6, but I'm sure it'll leave
you and Chuck wanting ;)
 
> It isn't clear to me why you implement both v3 and v4 of the LOCALIO
> program.  I don't see how they relate to the NFS protocol version.  Just
> implement v1 which simply returns the UUID.

Yeah, I'd love to pull it out to be standalone but in practice the
pattern I followed from NFS ACL (to use rpc_bind_new_program) took me
down the path of implementing it for both v3 and v4.  It did help to
put the endpoints to action by leveraging what NFS already provides
for encoding status though.

Would be nice to avoid it but it isn't immediately clear to me how.
Can be done as followup work but it'd take me some time to sort it
out -- might be you could cut through it more easily?

Only having a single LOCALIO protocol version would allow for
nfs_init_localioclient() to not need 'vers' to be specified. And it'd
remove the need for the .init_localioclient hook I added (as well as
the use of __always_inline to share nfs_init_localioclient between
fs/nfs/nfs[34]client.c)

Mike
diff mbox series

Patch

diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
new file mode 100644
index 000000000000..4b4595037a7f
--- /dev/null
+++ b/Documentation/filesystems/nfs/localio.rst
@@ -0,0 +1,101 @@ 
+===========
+NFS localio
+===========
+
+This document gives an overview of the LOCALIO protocol extension added
+to the Linux NFS client and server (both v3 and v4) to allow a client
+and server to reliably handshake to determine if they are on the same
+host.  The LOCALIO protocol extension follows the well-worn pattern
+established by the ACL protocol extension.
+
+The LOCALIO protocol extension is needed to allow robust discovery of
+clients local to their servers.  Prior to this extension a fragile
+sockaddr network address based match against all local network
+interfaces was attempted.  But unlike the LOCALIO protocol extension,
+the sockaddr-based matching didn't handle use of iptables or containers.
+
+The robust handshake between local client and server is just the
+beginning, the ultimate use-case this locality makes possible is the
+client is able to issue reads, writes and commits directly to the server
+without having to go over the network.  This is particularly useful for
+container usecases (e.g. kubernetes) where it is possible to run an IO
+job local to the server.
+
+The performance advantage realized from localio's ability to bypass
+using XDR and RPC for reads, writes and commits can be extreme, e.g.:
+fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8,
+-  With localio:
+  read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec)
+-  Without localio:
+  read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec)
+
+RPC
+---
+
+The LOCALIO RPC protocol consists of a single "GETUUID" RPC that allows
+the client to retrieve a server's uuid.  LOCALIOPROC_GETUUID encodes the
+server's uuid_t in terms of the fixed UUID_SIZE (16 bytes).  The fixed
+size opaque encode and decode XDR methods are used instead of the less
+efficient variable sized methods.
+
+NFS Common and Server
+---------------------
+
+First use is in nfsd, to add access to a global nfsd_uuids list in
+nfs_common that is used to register and then identify local nfsd
+instances.
+
+nfsd_uuids is protected by the nfsd_mutex or RCU read lock and is
+composed of nfsd_uuid_t instances that are managed as nfsd creates them
+(per network namespace).
+
+nfsd_uuid_is_local() and nfsd_uuid_lookup() are used to search all local
+nfsd for the client specified nfsd uuid.
+
+The nfsd_uuids list is the basis for localio enablement, as such it has
+members that point to nfsd memory for direct use by the client
+(e.g. 'net' is the server's network namespace, through it the client can
+access nn->nfsd_serv with proper rcu read access).  It is this client
+and server synchronization that enables advanced usage and lifetime of
+objects to span from the host kernel's nfsd to per-container knfsd
+instances that are connected to nfs client's running on the same local
+host.
+
+NFS Client
+----------
+
+fs/nfs/localio.c:nfs_local_probe() will retrieve a server's uuid via
+LOCALIO protocol and check if the server with that uuid is known to be
+local.  This ensures client and server 1: support localio 2: are local
+to each other.
+
+See fs/nfs/localio.c:nfs_local_open_fh() and
+fs/nfsd/localio.c:nfsd_open_local_fh() for the interface that makes
+focused use of nfsd_uuid_t struct to allow a client local to a server to
+open a file pointer without needing to go over the network.
+
+The client's fs/nfs/localio.c:nfs_local_open_fh() will call into the
+server's fs/nfsd/localio.c:nfsd_open_local_fh() and carefully access
+both the nfsd network namespace and the associated nn->nfsd_serv in
+terms of RCU.  If nfsd_open_local_fh() finds that client no longer sees
+valid nfsd objects (be it struct net or nn->nfsd_serv) it return ENXIO
+to nfs_local_open_fh() and the client will try to reestablish the
+LOCALIO resources needed by calling nfs_local_probe() again.  This
+recovery is needed if/when an nfsd instance running in a container were
+to reboot while a localio client is connected to it.
+
+Testing
+-------
+
+The LOCALIO protocol extension and associated NFS localio read, right
+and commit access have proven stable against various test scenarios:
+
+-  Client and server both on localhost (for both v3 and v4.2).
+
+-  Various permutations of client and server support enablement for
+   both local and remote client and server.  Testing against NFS storage
+   products that don't support the LOCALIO protocol was also performed.
+
+-  Client on host, server within a container (for both v3 and v4.2)
+   The container testing was in terms of podman managed containers and
+   includes container stop/restart scenario.
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index c9592ad0afe2..a9722e18b527 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -20,6 +20,8 @@  extern struct list_head nfsd_uuids;
  * Each nfsd instance has an nfsd_uuid_t that is accessible through the
  * global nfsd_uuids list. Useful to allow a client to negotiate if localio
  * possible with its server.
+ *
+ * See Documentation/filesystems/nfs/localio.rst for more detail.
  */
 typedef struct {
 	uuid_t uuid;