diff mbox series

[RFC,2/6] exports: Implement new export option reexport=

Message ID 20220217131531.2890-3-richard@nod.at (mailing list archive)
State New, archived
Headers show
Series nfs-utils: Improving NFS re-exports | expand

Commit Message

Richard Weinberger Feb. 17, 2022, 1:15 p.m. UTC
When re-exporting a NFS volume it is mandatory to specify
either a UUID or numerical fsid= option because nfsd is unable
to derive a identifier on its own.

For NFS cross mounts this becomes a problem because nfsd also
needs a identifier for every crossed mount.
A common workaround is stating every single subvolume in the
exports list too.
But this defeats the purpose of the crossmnt option and is tedious.

This is where the reexport= tries to help.
It offers various strategies to automatically derive a identifier
for NFS volumes and sub volumes.
Each have their pros and cons.

Currently three modes are implemented:

1. auto-fsidnum
   In this mode mountd/exportd will create a new numerical fsid
   for a NFS volume and subvolume. The numbers are stored in a database
   such that the server will always use the same fsid.
   The entry in the exports file allowed to skip fsid= entiry but
   stating a UUID is allowed, if needed.

   This mode has the obvious downside that load balancing is not
   possible since multiple re-exporting NFS servers would generate
   different ids.

2. predefined-fsidnum
   This mode works just like auto-fsidnum but does not generate ids
   for you. It helps in the load balancing case. A system administrator
   has to manually maintain the database and install it on all re-exporting
   NFS servers. If you have a massive amount of subvolumes this mode
   will help because you don't have to bloat the exports list.

3. remote-devfsid
   If this mode is selected mountd/exportd will derive an UUID from the
   re-exported NFS volume's fsid (rfc7530 section-5.8.1.9).
   No further local state is needed on the re-exporting server.
   The export list entry still needs a fsid= setting because while
   parsing the exports file the NFS mounts might be not there yet.
   This mode is dangerous, use only of you're absolutely sure that the
   NFS server you're re-exporting has a stable fsid. Chances are good
   that it can change.
   Since an UUID is derived, reexporting from NFSv3 to NFSv3 is not
   possible. The file handle space is too small.
   NFSv3 to NFSv4 works, though.

Signed-off-by: Richard Weinberger <richard@nod.at>
---
 support/include/nfslib.h   |  1 +
 support/nfs/Makefile.am    |  1 +
 support/nfs/exports.c      | 73 ++++++++++++++++++++++++++++++++++++++
 utils/exportfs/Makefile.am |  4 +++
 utils/mount/Makefile.am    |  6 ++++
 5 files changed, 85 insertions(+)

Comments

J. Bruce Fields March 8, 2022, 10:10 p.m. UTC | #1
On Thu, Feb 17, 2022 at 02:15:27PM +0100, Richard Weinberger wrote:
> When re-exporting a NFS volume it is mandatory to specify
> either a UUID or numerical fsid= option because nfsd is unable
> to derive a identifier on its own.
> 
> For NFS cross mounts this becomes a problem because nfsd also
> needs a identifier for every crossed mount.
> A common workaround is stating every single subvolume in the
> exports list too.
> But this defeats the purpose of the crossmnt option and is tedious.
> 
> This is where the reexport= tries to help.
> It offers various strategies to automatically derive a identifier
> for NFS volumes and sub volumes.
> Each have their pros and cons.
> 
> Currently three modes are implemented:
> 
> 1. auto-fsidnum
>    In this mode mountd/exportd will create a new numerical fsid
>    for a NFS volume and subvolume. The numbers are stored in a database
>    such that the server will always use the same fsid.
>    The entry in the exports file allowed to skip fsid= entiry but
>    stating a UUID is allowed, if needed.
> 
>    This mode has the obvious downside that load balancing is not
>    possible since multiple re-exporting NFS servers would generate
>    different ids.

This is the one I think it makes sense to concentrate on first.  Ideally
it should Just Work without requiring any configuration.

And then eventually my hope is that we could replace sqlite by a
distributed database to get filehandles that are consistent across
multiple servers.

> 
> 2. predefined-fsidnum
>    This mode works just like auto-fsidnum but does not generate ids
>    for you. It helps in the load balancing case. A system administrator
>    has to manually maintain the database and install it on all re-exporting
>    NFS servers. If you have a massive amount of subvolumes this mode
>    will help because you don't have to bloat the exports list.

OK, I can see that being sort of useful but it'd be nice if we could
start with something more automatic.

> 3. remote-devfsid
>    If this mode is selected mountd/exportd will derive an UUID from the
>    re-exported NFS volume's fsid (rfc7530 section-5.8.1.9).

How does the server take a filehandle with a UUID in it and map that
UUID back to the original fsid?

>    No further local state is needed on the re-exporting server.
>    The export list entry still needs a fsid= setting because while
>    parsing the exports file the NFS mounts might be not there yet.

I don't understand that bit.

>    This mode is dangerous, use only of you're absolutely sure that the
>    NFS server you're re-exporting has a stable fsid. Chances are good
>    that it can change.

The fsid should be stable.

The case I'm worried about is the case where we're reexporting exports
from multiple servers.  Then there's nothing preventing the two servers
from accidentally picking the same fsid to represent different exports.

--b.

>    Since an UUID is derived, reexporting from NFSv3 to NFSv3 is not
>    possible. The file handle space is too small.
>    NFSv3 to NFSv4 works, though.
Richard Weinberger March 9, 2022, 9:43 a.m. UTC | #2
Bruce,

----- Ursprüngliche Mail -----
> Von: "bfields" <bfields@fieldses.org>
>> 1. auto-fsidnum
>>    In this mode mountd/exportd will create a new numerical fsid
>>    for a NFS volume and subvolume. The numbers are stored in a database
>>    such that the server will always use the same fsid.
>>    The entry in the exports file allowed to skip fsid= entiry but
>>    stating a UUID is allowed, if needed.
>> 
>>    This mode has the obvious downside that load balancing is not
>>    possible since multiple re-exporting NFS servers would generate
>>    different ids.
> 
> This is the one I think it makes sense to concentrate on first.  Ideally
> it should Just Work without requiring any configuration.

Agreed.
 
> And then eventually my hope is that we could replace sqlite by a
> distributed database to get filehandles that are consistent across
> multiple servers.

Sure. I see at least two options here:

a. Allow multiple SQL backends in nfs-utils. SQLite by default, but also remote MariaDB
or Postgres...

b. Placing the SQLite database on a shared file system that is capable of file locks.
That way we can use SQlite as-is. We just need to handle the SQLITE_LOCKED case in the code.
Luckily writing happens seldom, so this shouldn't be a big deal.

>> 
>> 2. predefined-fsidnum
>>    This mode works just like auto-fsidnum but does not generate ids
>>    for you. It helps in the load balancing case. A system administrator
>>    has to manually maintain the database and install it on all re-exporting
>>    NFS servers. If you have a massive amount of subvolumes this mode
>>    will help because you don't have to bloat the exports list.
> 
> OK, I can see that being sort of useful but it'd be nice if we could
> start with something more automatic.
> 
>> 3. remote-devfsid
>>    If this mode is selected mountd/exportd will derive an UUID from the
>>    re-exported NFS volume's fsid (rfc7530 section-5.8.1.9).
> 
> How does the server take a filehandle with a UUID in it and map that
> UUID back to the original fsid?

knfsd does not need the original fsid. All it sees is the UUID.
If it needs to know which export belongs to a UUID it asks mountd.
In mountd the regular UUID lookup is used then.

>>    No further local state is needed on the re-exporting server.
>>    The export list entry still needs a fsid= setting because while
>>    parsing the exports file the NFS mounts might be not there yet.
> 
> I don't understand that bit.

I tried to explain that with this mode we don't need to store UUID or
fsids on disk.

>>    This mode is dangerous, use only of you're absolutely sure that the
>>    NFS server you're re-exporting has a stable fsid. Chances are good
>>    that it can change.
> 
> The fsid should be stable.

Didn't you explain me last time that it is not?
By fsid I mean:
https://datatracker.ietf.org/doc/html/rfc7530#section-5.8.1.9
https://datatracker.ietf.org/doc/html/rfc7530#section-2.2.5

So after a reboot the very same filesystem could be on different
disks and the major/minor tuple is different. (If the server uses disk  ids
as is).
 
> The case I'm worried about is the case where we're reexporting exports
> from multiple servers.  Then there's nothing preventing the two servers
> from accidentally picking the same fsid to represent different exports.

That's a good point. Since /proc/fs/nfsfs/volumes shows all that information
we can add sanity checks to mountd.

Thanks,
//richard
diff mbox series

Patch

diff --git a/support/include/nfslib.h b/support/include/nfslib.h
index 6faba71b..0465a1ff 100644
--- a/support/include/nfslib.h
+++ b/support/include/nfslib.h
@@ -85,6 +85,7 @@  struct exportent {
 	struct sec_entry e_secinfo[SECFLAVOR_COUNT+1];
 	unsigned int	e_ttl;
 	char *		e_realpath;
+	int		e_reexport;
 };
 
 struct rmtabent {
diff --git a/support/nfs/Makefile.am b/support/nfs/Makefile.am
index 67e3a8e1..c4357e7d 100644
--- a/support/nfs/Makefile.am
+++ b/support/nfs/Makefile.am
@@ -9,6 +9,7 @@  libnfs_la_SOURCES = exports.c rmtab.c xio.c rpcmisc.c rpcdispatch.c \
 		   svc_socket.c cacheio.c closeall.c nfs_mntent.c \
 		   svc_create.c atomicio.c strlcat.c strlcpy.c
 libnfs_la_LIBADD = libnfsconf.la
+libnfs_la_CPPFLAGS = -I$(top_srcdir)/support/reexport
 
 libnfsconf_la_SOURCES = conffile.c xlog.c
 
diff --git a/support/nfs/exports.c b/support/nfs/exports.c
index 2c8f0752..13129d68 100644
--- a/support/nfs/exports.c
+++ b/support/nfs/exports.c
@@ -31,6 +31,7 @@ 
 #include "xlog.h"
 #include "xio.h"
 #include "pseudoflavors.h"
+#include "reexport.h"
 
 #define EXPORT_DEFAULT_FLAGS	\
   (NFSEXP_READONLY|NFSEXP_ROOTSQUASH|NFSEXP_GATHERED_WRITES|NFSEXP_NOSUBTREECHECK)
@@ -103,6 +104,7 @@  static void init_exportent (struct exportent *ee, int fromkernel)
 	ee->e_nsqgids = 0;
 	ee->e_uuid = NULL;
 	ee->e_ttl = default_ttl;
+	ee->e_reexport = REEXP_NONE;
 }
 
 struct exportent *
@@ -302,6 +304,26 @@  putexportent(struct exportent *ep)
 	}
 	if (ep->e_uuid)
 		fprintf(fp, "fsid=%s,", ep->e_uuid);
+
+	if (ep->e_reexport) {
+		fprintf(fp, "reexport=");
+		switch (ep->e_reexport) {
+			case REEXP_AUTO_FSIDNUM:
+				fprintf(fp, "auto-fsidnum");
+				break;
+			case REEXP_PREDEFINED_FSIDNUM:
+				fprintf(fp, "predefined-fsidnum");
+				break;
+			case REEXP_REMOTE_DEVFSID:
+				fprintf(fp, "remote-devfsid");
+				break;
+			default:
+				xlog(L_ERROR, "unknown reexport method %i", ep->e_reexport);
+				fprintf(fp, "none");
+		}
+		fprintf(fp, ",");
+	}
+
 	if (ep->e_mountpoint)
 		fprintf(fp, "mountpoint%s%s,",
 			ep->e_mountpoint[0]?"=":"", ep->e_mountpoint);
@@ -538,6 +560,7 @@  parseopts(char *cp, struct exportent *ep, int warn, int *had_subtree_opt_ptr)
 	char 	*flname = efname?efname:"command line";
 	int	flline = efp?efp->x_line:0;
 	unsigned int active = 0;
+	int saw_reexport = 0;
 
 	squids = ep->e_squids; nsquids = ep->e_nsquids;
 	sqgids = ep->e_sqgids; nsqgids = ep->e_nsqgids;
@@ -644,6 +667,13 @@  bad_option:
 			}
 		} else if (strncmp(opt, "fsid=", 5) == 0) {
 			char *oe;
+
+			if (saw_reexport) {
+				xlog(L_ERROR, "%s:%d: 'fsid=' has to be after 'reexport=' %s\n",
+				     flname, flline, opt);
+				goto bad_option;
+			}
+
 			if (strcmp(opt+5, "root") == 0) {
 				ep->e_fsid = 0;
 				setflags(NFSEXP_FSID, active, ep);
@@ -688,6 +718,49 @@  bad_option:
 			active = parse_flavors(opt+4, ep);
 			if (!active)
 				goto bad_option;
+		} else if (strncmp(opt, "reexport=", 9) == 0) {
+#ifdef HAVE_REEXPORT_SUPPORT
+			char *strategy = strchr(opt, '=');
+
+			if (!strategy) {
+				xlog(L_ERROR, "%s:%d: bad option %s\n",
+				     flname, flline, opt);
+				goto bad_option;
+			}
+			strategy++;
+
+			if (saw_reexport) {
+				xlog(L_ERROR, "%s:%d: only one 'reexport=' is allowed%s\n",
+				     flname, flline, opt);
+				goto bad_option;
+			}
+
+			if (strcmp(strategy, "auto-fsidnum") == 0) {
+				ep->e_reexport = REEXP_AUTO_FSIDNUM;
+			} else if (strcmp(strategy, "predefined-fsidnum") == 0) {
+				ep->e_reexport = REEXP_PREDEFINED_FSIDNUM;
+			} else if (strcmp(strategy, "remote-devfsid") == 0) {
+				ep->e_reexport = REEXP_REMOTE_DEVFSID;
+			} else if (strcmp(strategy, "none") == 0) {
+				ep->e_reexport = REEXP_NONE;
+			} else {
+				xlog(L_ERROR, "%s:%d: bad option %s\n",
+				     flname, flline, strategy);
+				goto bad_option;
+			}
+
+			if (reexpdb_apply_reexport_settings(ep, flname, flline) != 0)
+				goto bad_option;
+
+			if (ep->e_fsid)
+				setflags(NFSEXP_FSID, active, ep);
+
+			saw_reexport = 1;
+#else
+			xlog(L_ERROR, "%s:%d: 'reexport=' not available, rebuild with --enable-reexport\n",
+			     flname, flline);
+			goto bad_option;
+#endif
 		} else {
 			xlog(L_ERROR, "%s:%d: unknown keyword \"%s\"\n",
 					flname, flline, opt);
diff --git a/utils/exportfs/Makefile.am b/utils/exportfs/Makefile.am
index 96524c72..9eabef14 100644
--- a/utils/exportfs/Makefile.am
+++ b/utils/exportfs/Makefile.am
@@ -12,4 +12,8 @@  exportfs_LDADD = ../../support/export/libexport.a \
 		 ../../support/misc/libmisc.a \
 		 $(LIBWRAP) $(LIBNSL) $(LIBPTHREAD)
 
+if CONFIG_REEXPORT
+exportfs_LDADD += ../../support/reexport/libreexport.a $(LIBSQLITE) -lrt
+endif
+
 MAINTAINERCLEANFILES = Makefile.in
diff --git a/utils/mount/Makefile.am b/utils/mount/Makefile.am
index 3101f7ab..f4d5b182 100644
--- a/utils/mount/Makefile.am
+++ b/utils/mount/Makefile.am
@@ -32,6 +32,12 @@  mount_nfs_LDADD = ../../support/nfs/libnfs.la \
 		  ../../support/misc/libmisc.a \
 		  $(LIBTIRPC)
 
+if CONFIG_REEXPORT
+mount_nfs_LDADD += ../../support/reexport/libreexport.a \
+		   $(LIBSQLITE) -lrt $(LIBPTHREAD)
+endif
+
+
 mount_nfs_SOURCES = $(mount_common)
 
 if CONFIG_LIBMOUNT