Message ID | 20220131104316.10357-1-richard@nod.at (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] mountd: export: Deal with NFS filesystems | expand |
Thanks for continuing to look into this, it's interesting work. On Mon, Jan 31, 2022 at 11:43:16AM +0100, Richard Weinberger wrote: > When a NFS filesystem is exported usually NFSD refuses > to export the filesystem because mountd was unable to provide > an UUID and the kernel cannot derive a dev id from the NFS client > super block. > > To deal with this situation, teach uuid_by_path() how to generate > an UUID from a NFS filesystem. > > Using /proc/fs/nfsfs/volumes it is possible to find the NFS fsid > from the backend and use it as seed for mountd's UUID mechanism. Sorry, I haven't checked--what is that number, really? It's probably not the fsid returned from the server, as that wouldn't be guaranteed unique across multiple servers. But that means it's probably generated in a way that doesn't guarantee it's stable across reboots. And we need filehandles to work across reboots. I think we've got to generate fsid's ourself and keep an on-disk database. > This is the userspace side to make cross mounts work for NFS re-exports. > The basic idea is that mountd takes the NFS fsid from the backend server > and feeds into mountd'd existing UUID mechanism. > > Getting the NFS fsid is currently a bit clumsy, mountd has to scan > /proc/fs/nfsfs/volumes. With the upcoming fsinfo() system call this > information could be transferred in a much simpler way. > e.g. NFS client stores the fsid in its sb->s_uuid and fsinfo() exposes > sb->s_uuid to userspace. > > This approach works for v3-to-v4 and v4-to-v4 re-exports. > v3-to-v3 does *not* work, the reason is that nfs_encode_fh() aborts due > to not enough space in the fhandle. > I'm still investigating on how to improve this case. We might be able to do better, but I think a v4-only solution to start would be OK. > Currently I’m toying around with a NFS fsid to export FSID_NUM mappings > scheme which are stored to disk. I think something like that's more like what we need. > Another possibility is a new fsid type for NFSD, where the raw NFS fsid > from the backend server is mirrored. Although such a mirror mode makes > local exports impossible. > At the end of the day this is something the admin has to decide. > I don't think there will be a generic solution which satisfies all needs. We can provide options if we need to, but the costs of options can be underestimated. Ideally we'd at least have a default that's safe and works for most cases. --b. > > Looking forward for your feedback. :-) > > Thanks, > //richard > --- > support/export/cache.c | 65 +++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 61 insertions(+), 4 deletions(-) > > diff --git a/support/export/cache.c b/support/export/cache.c > index a5823e92e9f2..053ad86619b8 100644 > --- a/support/export/cache.c > +++ b/support/export/cache.c > @@ -331,6 +331,51 @@ static const unsigned long nonblkid_filesystems[] = { > 0 /* last */ > }; > > +static int get_uuid_from_fsid(char *path, char *uuid_str, size_t len) > +{ > + unsigned int min_dev, maj_dev, min_fsid, maj_fsid; > + int rc, n, found = 0, header_seen = 0; > + struct stat stb; > + FILE *nfsfs_fd; > + char line[128]; > + > + rc = nfsd_path_stat(path, &stb); > + if (rc) { > + xlog(L_WARNING, "Unable to stat %s", path); > + return 0; > + } > + > + nfsfs_fd = fopen("/proc/fs/nfsfs/volumes", "r"); > + if (nfsfs_fd == NULL) { > + xlog(L_WARNING, "Unable to open nfsfs volume file: %m"); > + return 0; > + } > + > + while (fgets(line, sizeof(line), nfsfs_fd) != NULL) { > + if (!header_seen) { > + header_seen = 1; > + continue; > + } > + n = sscanf(line, "v%*u %*x %*u %u:%u %x:%x %*s", &maj_dev, > + &min_dev, &maj_fsid, &min_fsid); > + > + if (n != 4) { > + xlog(L_WARNING, "Unable to parse nfsfs volume line: %d, %s", n, line); > + continue; > + } > + > + if (makedev(maj_dev, min_dev) == stb.st_dev) { > + found = 1; > + snprintf(uuid_str, len, "%08x%08x", maj_fsid, min_fsid); > + break; > + } > + } > + > + fclose(nfsfs_fd); > + > + return found; > +} > + > static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid) > { > /* get a uuid for the filesystem found at 'path'. > @@ -362,7 +407,7 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid) > */ > struct statfs64 st; > char fsid_val[17]; > - const char *blkid_val = NULL; > + const char *fsuuid_val = NULL; > const char *val; > int rc; > > @@ -375,7 +420,19 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid) > break; > } > if (*bad == 0) > - blkid_val = get_uuid_blkdev(path); > + fsuuid_val = get_uuid_blkdev(path); > + else if (*bad == 0x6969 /* NFS_SUPER_MAGIC */) { > + char tmp[17]; > + int ret = get_uuid_from_fsid(path, tmp, sizeof(tmp)); > + > + if (ret < 0) { > + xlog(L_WARNING, "Unable to read nfsfs volume file: %i", ret); > + } else if (ret == 0) { > + xlog(L_WARNING, "Unable to find nfsfs volume entry for %s", path); > + } else { > + fsuuid_val = tmp; > + } > + } > } > > if (rc == 0 && > @@ -385,8 +442,8 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid) > else > fsid_val[0] = 0; > > - if (blkid_val && (type--) == 0) > - val = blkid_val; > + if (fsuuid_val && (type--) == 0) > + val = fsuuid_val; > else if (fsid_val[0] && (type--) == 0) > val = fsid_val; > else > -- > 2.26.2
----- Ursprüngliche Mail ----- > Von: "bfields" <bfields@fieldses.org> >> Using /proc/fs/nfsfs/volumes it is possible to find the NFS fsid >> from the backend and use it as seed for mountd's UUID mechanism. > > Sorry, I haven't checked--what is that number, really? It's probably > not the fsid returned from the server, as that wouldn't be guaranteed > unique across multiple servers. But that means it's probably generated > in a way that doesn't guarantee it's stable across reboots. And we need > filehandles to work across reboots. Unless I badly misunderstood the code it comes from fs/nfs/client.c's nfs_create_server() where the NFS client fetches NFS_ATTR_FATTR via getattr(). So it should be unique: https://datatracker.ietf.org/doc/html/rfc7530#section-5.8.1.9 Thanks, //richard
On Mon, Jan 31, 2022 at 06:15:41PM +0100, Richard Weinberger wrote: > ----- Ursprüngliche Mail ----- > > Von: "bfields" <bfields@fieldses.org> > >> Using /proc/fs/nfsfs/volumes it is possible to find the NFS fsid > >> from the backend and use it as seed for mountd's UUID mechanism. > > > > Sorry, I haven't checked--what is that number, really? It's probably > > not the fsid returned from the server, as that wouldn't be guaranteed > > unique across multiple servers. But that means it's probably generated > > in a way that doesn't guarantee it's stable across reboots. And we need > > filehandles to work across reboots. > > Unless I badly misunderstood the code it comes from fs/nfs/client.c's > nfs_create_server() where the NFS client fetches NFS_ATTR_FATTR via getattr(). > So it should be unique: > https://datatracker.ietf.org/doc/html/rfc7530#section-5.8.1.9 Unique within one server, but not across different servers. Depending on how your servers assign fsids, collisions might be quite likely. Results of a collision would be pretty bad. --b.
----- Ursprüngliche Mail ----- > Von: "bfields" <bfields@fieldses.org> >> Unless I badly misunderstood the code it comes from fs/nfs/client.c's >> nfs_create_server() where the NFS client fetches NFS_ATTR_FATTR via getattr(). >> So it should be unique: >> https://datatracker.ietf.org/doc/html/rfc7530#section-5.8.1.9 > > Unique within one server, but not across different servers. > > Depending on how your servers assign fsids, collisions might be quite > likely. Results of a collision would be pretty bad. Hmm. So this means when you have a setup where the same file system is exported via multiple servers (the load balancing use case), getattr on the same file can report a different fsid at any time? This renders the fsid attribute close to useless :-( Anyway, this is not a show stopper just makes more clear that we should take the FSID_NUM mapping path. Please stay tuned, I'll send ASAP another patch. Thanks, //richard
diff --git a/support/export/cache.c b/support/export/cache.c index a5823e92e9f2..053ad86619b8 100644 --- a/support/export/cache.c +++ b/support/export/cache.c @@ -331,6 +331,51 @@ static const unsigned long nonblkid_filesystems[] = { 0 /* last */ }; +static int get_uuid_from_fsid(char *path, char *uuid_str, size_t len) +{ + unsigned int min_dev, maj_dev, min_fsid, maj_fsid; + int rc, n, found = 0, header_seen = 0; + struct stat stb; + FILE *nfsfs_fd; + char line[128]; + + rc = nfsd_path_stat(path, &stb); + if (rc) { + xlog(L_WARNING, "Unable to stat %s", path); + return 0; + } + + nfsfs_fd = fopen("/proc/fs/nfsfs/volumes", "r"); + if (nfsfs_fd == NULL) { + xlog(L_WARNING, "Unable to open nfsfs volume file: %m"); + return 0; + } + + while (fgets(line, sizeof(line), nfsfs_fd) != NULL) { + if (!header_seen) { + header_seen = 1; + continue; + } + n = sscanf(line, "v%*u %*x %*u %u:%u %x:%x %*s", &maj_dev, + &min_dev, &maj_fsid, &min_fsid); + + if (n != 4) { + xlog(L_WARNING, "Unable to parse nfsfs volume line: %d, %s", n, line); + continue; + } + + if (makedev(maj_dev, min_dev) == stb.st_dev) { + found = 1; + snprintf(uuid_str, len, "%08x%08x", maj_fsid, min_fsid); + break; + } + } + + fclose(nfsfs_fd); + + return found; +} + static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid) { /* get a uuid for the filesystem found at 'path'. @@ -362,7 +407,7 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid) */ struct statfs64 st; char fsid_val[17]; - const char *blkid_val = NULL; + const char *fsuuid_val = NULL; const char *val; int rc; @@ -375,7 +420,19 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid) break; } if (*bad == 0) - blkid_val = get_uuid_blkdev(path); + fsuuid_val = get_uuid_blkdev(path); + else if (*bad == 0x6969 /* NFS_SUPER_MAGIC */) { + char tmp[17]; + int ret = get_uuid_from_fsid(path, tmp, sizeof(tmp)); + + if (ret < 0) { + xlog(L_WARNING, "Unable to read nfsfs volume file: %i", ret); + } else if (ret == 0) { + xlog(L_WARNING, "Unable to find nfsfs volume entry for %s", path); + } else { + fsuuid_val = tmp; + } + } } if (rc == 0 && @@ -385,8 +442,8 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid) else fsid_val[0] = 0; - if (blkid_val && (type--) == 0) - val = blkid_val; + if (fsuuid_val && (type--) == 0) + val = fsuuid_val; else if (fsid_val[0] && (type--) == 0) val = fsid_val; else
When a NFS filesystem is exported usually NFSD refuses to export the filesystem because mountd was unable to provide an UUID and the kernel cannot derive a dev id from the NFS client super block. To deal with this situation, teach uuid_by_path() how to generate an UUID from a NFS filesystem. Using /proc/fs/nfsfs/volumes it is possible to find the NFS fsid from the backend and use it as seed for mountd's UUID mechanism. Signed-off-by: Richard Weinberger <richard@nod.at> --- Hello, This is the userspace side to make cross mounts work for NFS re-exports. The basic idea is that mountd takes the NFS fsid from the backend server and feeds into mountd'd existing UUID mechanism. Getting the NFS fsid is currently a bit clumsy, mountd has to scan /proc/fs/nfsfs/volumes. With the upcoming fsinfo() system call this information could be transferred in a much simpler way. e.g. NFS client stores the fsid in its sb->s_uuid and fsinfo() exposes sb->s_uuid to userspace. This approach works for v3-to-v4 and v4-to-v4 re-exports. v3-to-v3 does *not* work, the reason is that nfs_encode_fh() aborts due to not enough space in the fhandle. I'm still investigating on how to improve this case. Currently I’m toying around with a NFS fsid to export FSID_NUM mappings scheme which are stored to disk. Another possibility is a new fsid type for NFSD, where the raw NFS fsid from the backend server is mirrored. Although such a mirror mode makes local exports impossible. At the end of the day this is something the admin has to decide. I don't think there will be a generic solution which satisfies all needs. Looking forward for your feedback. :-) Thanks, //richard --- support/export/cache.c | 65 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 61 insertions(+), 4 deletions(-)