diff mbox series

[RFC] mountd: export: Deal with NFS filesystems

Message ID 20220131104316.10357-1-richard@nod.at (mailing list archive)
State New, archived
Headers show
Series [RFC] mountd: export: Deal with NFS filesystems | expand

Commit Message

Richard Weinberger Jan. 31, 2022, 10:43 a.m. UTC
When a NFS filesystem is exported usually NFSD refuses
to export the filesystem because mountd was unable to provide
an UUID and the kernel cannot derive a dev id from the NFS client
super block.

To deal with this situation, teach uuid_by_path() how to generate
an UUID from a NFS filesystem.

Using /proc/fs/nfsfs/volumes it is possible to find the NFS fsid
from the backend and use it as seed for mountd's UUID mechanism.

Signed-off-by: Richard Weinberger <richard@nod.at>
---
Hello,

This is the userspace side to make cross mounts work for NFS re-exports.
The basic idea is that mountd takes the NFS fsid from the backend server
and feeds into mountd'd existing UUID mechanism.

Getting the NFS fsid is currently a bit clumsy, mountd has to scan
/proc/fs/nfsfs/volumes. With the upcoming fsinfo() system call this
information could be transferred in a much simpler way.
e.g. NFS client stores the fsid in its sb->s_uuid and fsinfo() exposes
sb->s_uuid to userspace.

This approach works for v3-to-v4 and v4-to-v4 re-exports.
v3-to-v3 does *not* work, the reason is that nfs_encode_fh() aborts due
to not enough space in the fhandle.
I'm still investigating on how to improve this case.

Currently I’m toying around with a NFS fsid to export FSID_NUM mappings
scheme which are stored to disk.
Another possibility is a new fsid type for NFSD, where the raw NFS fsid
from the backend server is mirrored. Although such a mirror mode makes
local exports impossible.
At the end of the day this is something the admin has to decide.
I don't think there will be a generic solution which satisfies all needs.

Looking forward for your feedback. :-)

Thanks,
//richard
---
 support/export/cache.c | 65 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 61 insertions(+), 4 deletions(-)

Comments

J. Bruce Fields Jan. 31, 2022, 5:01 p.m. UTC | #1
Thanks for continuing to look into this, it's interesting work.

On Mon, Jan 31, 2022 at 11:43:16AM +0100, Richard Weinberger wrote:
> When a NFS filesystem is exported usually NFSD refuses
> to export the filesystem because mountd was unable to provide
> an UUID and the kernel cannot derive a dev id from the NFS client
> super block.
> 
> To deal with this situation, teach uuid_by_path() how to generate
> an UUID from a NFS filesystem.
> 
> Using /proc/fs/nfsfs/volumes it is possible to find the NFS fsid
> from the backend and use it as seed for mountd's UUID mechanism.

Sorry, I haven't checked--what is that number, really?  It's probably
not the fsid returned from the server, as that wouldn't be guaranteed
unique across multiple servers.  But that means it's probably generated
in a way that doesn't guarantee it's stable across reboots.  And we need
filehandles to work across reboots.

I think we've got to generate fsid's ourself and keep an on-disk
database.

> This is the userspace side to make cross mounts work for NFS re-exports.
> The basic idea is that mountd takes the NFS fsid from the backend server
> and feeds into mountd'd existing UUID mechanism.
> 
> Getting the NFS fsid is currently a bit clumsy, mountd has to scan
> /proc/fs/nfsfs/volumes. With the upcoming fsinfo() system call this
> information could be transferred in a much simpler way.
> e.g. NFS client stores the fsid in its sb->s_uuid and fsinfo() exposes
> sb->s_uuid to userspace.
> 
> This approach works for v3-to-v4 and v4-to-v4 re-exports.
> v3-to-v3 does *not* work, the reason is that nfs_encode_fh() aborts due
> to not enough space in the fhandle.
> I'm still investigating on how to improve this case.

We might be able to do better, but I think a v4-only solution to start
would be OK.

> Currently I’m toying around with a NFS fsid to export FSID_NUM mappings
> scheme which are stored to disk.

I think something like that's more like what we need.

> Another possibility is a new fsid type for NFSD, where the raw NFS fsid
> from the backend server is mirrored. Although such a mirror mode makes
> local exports impossible.
> At the end of the day this is something the admin has to decide.
> I don't think there will be a generic solution which satisfies all needs.

We can provide options if we need to, but the costs of options can be
underestimated.  Ideally we'd at least have a default that's safe and
works for most cases.

--b.

> 
> Looking forward for your feedback. :-)
> 
> Thanks,
> //richard
> ---
>  support/export/cache.c | 65 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 61 insertions(+), 4 deletions(-)
> 
> diff --git a/support/export/cache.c b/support/export/cache.c
> index a5823e92e9f2..053ad86619b8 100644
> --- a/support/export/cache.c
> +++ b/support/export/cache.c
> @@ -331,6 +331,51 @@ static const unsigned long nonblkid_filesystems[] = {
>      0        /* last */
>  };
>  
> +static int get_uuid_from_fsid(char *path, char *uuid_str, size_t len)
> +{
> +	unsigned int min_dev, maj_dev, min_fsid, maj_fsid;
> +	int rc, n, found = 0, header_seen = 0;
> +	struct stat stb;
> +	FILE *nfsfs_fd;
> +	char line[128];
> +
> +	rc = nfsd_path_stat(path, &stb);
> +	if (rc) {
> +		xlog(L_WARNING, "Unable to stat %s", path);
> +		return 0;
> +	}
> +
> +	nfsfs_fd = fopen("/proc/fs/nfsfs/volumes", "r");
> +	if (nfsfs_fd == NULL) {
> +		xlog(L_WARNING, "Unable to open nfsfs volume file: %m");
> +		return 0;
> +	}
> +
> +	while (fgets(line, sizeof(line), nfsfs_fd) != NULL) {
> +		if (!header_seen) {
> +			header_seen = 1;
> +			continue;
> +		}
> +		n = sscanf(line, "v%*u %*x %*u %u:%u %x:%x %*s", &maj_dev,
> +			   &min_dev, &maj_fsid, &min_fsid);
> +
> +		if (n != 4) {
> +			xlog(L_WARNING, "Unable to parse nfsfs volume line: %d, %s", n, line);
> +			continue;
> +		}
> +
> +		if (makedev(maj_dev, min_dev) == stb.st_dev) {
> +			found = 1;
> +			snprintf(uuid_str, len, "%08x%08x", maj_fsid, min_fsid);
> +			break;
> +		}
> +	}
> +
> +	fclose(nfsfs_fd);
> +
> +	return found;
> +}
> +
>  static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
>  {
>  	/* get a uuid for the filesystem found at 'path'.
> @@ -362,7 +407,7 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
>  	 */
>  	struct statfs64 st;
>  	char fsid_val[17];
> -	const char *blkid_val = NULL;
> +	const char *fsuuid_val = NULL;
>  	const char *val;
>  	int rc;
>  
> @@ -375,7 +420,19 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
>  				break;
>  		}
>  		if (*bad == 0)
> -			blkid_val = get_uuid_blkdev(path);
> +			fsuuid_val = get_uuid_blkdev(path);
> +		else if (*bad == 0x6969 /* NFS_SUPER_MAGIC */) {
> +			char tmp[17];
> +			int ret = get_uuid_from_fsid(path, tmp, sizeof(tmp));
> +
> +			if (ret < 0) {
> +				xlog(L_WARNING, "Unable to read nfsfs volume file: %i", ret);
> +			} else if (ret == 0) {
> +				xlog(L_WARNING, "Unable to find nfsfs volume entry for %s", path);
> +			} else {
> +				fsuuid_val = tmp;
> +			}
> +		}
>  	}
>  
>  	if (rc == 0 &&
> @@ -385,8 +442,8 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
>  	else
>  		fsid_val[0] = 0;
>  
> -	if (blkid_val && (type--) == 0)
> -		val = blkid_val;
> +	if (fsuuid_val && (type--) == 0)
> +		val = fsuuid_val;
>  	else if (fsid_val[0] && (type--) == 0)
>  		val = fsid_val;
>  	else
> -- 
> 2.26.2
Richard Weinberger Jan. 31, 2022, 5:15 p.m. UTC | #2
----- Ursprüngliche Mail -----
> Von: "bfields" <bfields@fieldses.org>
>> Using /proc/fs/nfsfs/volumes it is possible to find the NFS fsid
>> from the backend and use it as seed for mountd's UUID mechanism.
> 
> Sorry, I haven't checked--what is that number, really?  It's probably
> not the fsid returned from the server, as that wouldn't be guaranteed
> unique across multiple servers.  But that means it's probably generated
> in a way that doesn't guarantee it's stable across reboots.  And we need
> filehandles to work across reboots.

Unless I badly misunderstood the code it comes from fs/nfs/client.c's
nfs_create_server() where the NFS client fetches NFS_ATTR_FATTR via getattr().
So it should be unique:
https://datatracker.ietf.org/doc/html/rfc7530#section-5.8.1.9

Thanks,
//richard
J. Bruce Fields Jan. 31, 2022, 8:37 p.m. UTC | #3
On Mon, Jan 31, 2022 at 06:15:41PM +0100, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
> > Von: "bfields" <bfields@fieldses.org>
> >> Using /proc/fs/nfsfs/volumes it is possible to find the NFS fsid
> >> from the backend and use it as seed for mountd's UUID mechanism.
> > 
> > Sorry, I haven't checked--what is that number, really?  It's probably
> > not the fsid returned from the server, as that wouldn't be guaranteed
> > unique across multiple servers.  But that means it's probably generated
> > in a way that doesn't guarantee it's stable across reboots.  And we need
> > filehandles to work across reboots.
> 
> Unless I badly misunderstood the code it comes from fs/nfs/client.c's
> nfs_create_server() where the NFS client fetches NFS_ATTR_FATTR via getattr().
> So it should be unique:
> https://datatracker.ietf.org/doc/html/rfc7530#section-5.8.1.9

Unique within one server, but not across different servers.

Depending on how your servers assign fsids, collisions might be quite
likely.  Results of a collision would be pretty bad.

--b.
Richard Weinberger Jan. 31, 2022, 9:57 p.m. UTC | #4
----- Ursprüngliche Mail -----
> Von: "bfields" <bfields@fieldses.org>
>> Unless I badly misunderstood the code it comes from fs/nfs/client.c's
>> nfs_create_server() where the NFS client fetches NFS_ATTR_FATTR via getattr().
>> So it should be unique:
>> https://datatracker.ietf.org/doc/html/rfc7530#section-5.8.1.9
> 
> Unique within one server, but not across different servers.
> 
> Depending on how your servers assign fsids, collisions might be quite
> likely.  Results of a collision would be pretty bad.

Hmm. So this means when you have a setup where the same file system is exported
via multiple servers (the load balancing use case), getattr on the same file can
report a different fsid at any time?
This renders the fsid attribute close to useless :-(

Anyway, this is not a show stopper just makes more clear that we should
take the FSID_NUM mapping path.
Please stay tuned, I'll send ASAP another patch.

Thanks,
//richard
diff mbox series

Patch

diff --git a/support/export/cache.c b/support/export/cache.c
index a5823e92e9f2..053ad86619b8 100644
--- a/support/export/cache.c
+++ b/support/export/cache.c
@@ -331,6 +331,51 @@  static const unsigned long nonblkid_filesystems[] = {
     0        /* last */
 };
 
+static int get_uuid_from_fsid(char *path, char *uuid_str, size_t len)
+{
+	unsigned int min_dev, maj_dev, min_fsid, maj_fsid;
+	int rc, n, found = 0, header_seen = 0;
+	struct stat stb;
+	FILE *nfsfs_fd;
+	char line[128];
+
+	rc = nfsd_path_stat(path, &stb);
+	if (rc) {
+		xlog(L_WARNING, "Unable to stat %s", path);
+		return 0;
+	}
+
+	nfsfs_fd = fopen("/proc/fs/nfsfs/volumes", "r");
+	if (nfsfs_fd == NULL) {
+		xlog(L_WARNING, "Unable to open nfsfs volume file: %m");
+		return 0;
+	}
+
+	while (fgets(line, sizeof(line), nfsfs_fd) != NULL) {
+		if (!header_seen) {
+			header_seen = 1;
+			continue;
+		}
+		n = sscanf(line, "v%*u %*x %*u %u:%u %x:%x %*s", &maj_dev,
+			   &min_dev, &maj_fsid, &min_fsid);
+
+		if (n != 4) {
+			xlog(L_WARNING, "Unable to parse nfsfs volume line: %d, %s", n, line);
+			continue;
+		}
+
+		if (makedev(maj_dev, min_dev) == stb.st_dev) {
+			found = 1;
+			snprintf(uuid_str, len, "%08x%08x", maj_fsid, min_fsid);
+			break;
+		}
+	}
+
+	fclose(nfsfs_fd);
+
+	return found;
+}
+
 static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
 {
 	/* get a uuid for the filesystem found at 'path'.
@@ -362,7 +407,7 @@  static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
 	 */
 	struct statfs64 st;
 	char fsid_val[17];
-	const char *blkid_val = NULL;
+	const char *fsuuid_val = NULL;
 	const char *val;
 	int rc;
 
@@ -375,7 +420,19 @@  static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
 				break;
 		}
 		if (*bad == 0)
-			blkid_val = get_uuid_blkdev(path);
+			fsuuid_val = get_uuid_blkdev(path);
+		else if (*bad == 0x6969 /* NFS_SUPER_MAGIC */) {
+			char tmp[17];
+			int ret = get_uuid_from_fsid(path, tmp, sizeof(tmp));
+
+			if (ret < 0) {
+				xlog(L_WARNING, "Unable to read nfsfs volume file: %i", ret);
+			} else if (ret == 0) {
+				xlog(L_WARNING, "Unable to find nfsfs volume entry for %s", path);
+			} else {
+				fsuuid_val = tmp;
+			}
+		}
 	}
 
 	if (rc == 0 &&
@@ -385,8 +442,8 @@  static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
 	else
 		fsid_val[0] = 0;
 
-	if (blkid_val && (type--) == 0)
-		val = blkid_val;
+	if (fsuuid_val && (type--) == 0)
+		val = fsuuid_val;
 	else if (fsid_val[0] && (type--) == 0)
 		val = fsid_val;
 	else