Message ID | 871spj1pfr.fsf@notabene.neil.brown.name (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/14/2017 01:51 AM, NeilBrown wrote: > Please try this (against a clean nfs-utils. i.e. remove the previous > patch). > It is a hack and would need to be totally re-written, but hopely the > measurements you make and strace that you report could be useful. > Also, for the strace, please use "-ttt" rather than "-tt" like I asked > before. It is easier to find the difference between two times with > -ttt. And add -T as well. Sorry this took so long but I had to be sure of my results. With your latest patch applied I am unable to mount my NFS shares and thus continue with the tests from before. Since the patch you provided cannot be applied cleanly to the source used to build '1:1.2.8-9ubuntu12.1' for Ubuntu Xenial I chose to build a couple of VMs to continue this testing. They should be identical to the machines I was using before except they are under powered and not using nfs-common, nfs-kernel-server package version 1:1.2.8-9ubuntu12.1. Most importantly the zfs datasets and the files in /etc/exports.d are the same. I found that utils/mountd/cache.c was last modified in 1.3.4, so I tested both 1.3.4 and 2.1.1 without your patch as a baseline. I was able to mount my shares in both versions. I then reverted my VMs to a state before the baseline install and installed again with your patch applied. I get the following with your patch applied on 1.3.4 and 2.1.1: root@nfsclient:~# mount -avv / : ignored /boot : already mounted none : ignored mount.nfs4: timeout set for Tue Jul 25 15:43:13 2017 mount.nfs4: trying text-based options 'ac,hard,rsize=16384,wsize=16384,vers=4.2,addr=10.135.24.23,clientaddr=10.135.24.22' mount.nfs4: mount(2): No such file or directory mount.nfs4: trying text-based options 'ac,hard,rsize=16384,wsize=16384,addr=10.135.24.23' mount.nfs4: prog 100003, trying vers=3, prot=6 mount.nfs4: trying 10.135.24.23 prog 100003 vers 3 prot TCP port 2049 mount.nfs4: prog 100005, trying vers=3, prot=17 mount.nfs4: trying 10.135.24.23 prog 100005 vers 3 prot UDP port 41598 mount.nfs4: mount(2): Permission denied mount.nfs4: access denied by server while mounting nfsserver.cs.uchicago.edu:/homes mount.nfs4: timeout set for Tue Jul 25 15:43:14 2017 mount.nfs4: trying text-based options 'ac,hard,rsize=16384,wsize=16384,vers=4.2,addr=10.135.24.23,clientaddr=10.135.24.22' mount.nfs4: mount(2): No such file or directory mount.nfs4: trying text-based options 'ac,hard,rsize=16384,wsize=16384,addr=10.135.24.23' mount.nfs4: prog 100003, trying vers=3, prot=6 mount.nfs4: trying 10.135.24.23 prog 100003 vers 3 prot TCP port 2049 mount.nfs4: prog 100005, trying vers=3, prot=17 mount.nfs4: trying 10.135.24.23 prog 100005 vers 3 prot UDP port 41598 mount.nfs4: mount(2): Permission denied mount.nfs4: access denied by server while mounting nfsserver.cs.uchicago.edu:/stage Attached are my install notes and the strace during the failed mount above.
On Tue, Jul 25 2017, Phil Kauffman wrote: > On 07/14/2017 01:51 AM, NeilBrown wrote: >> Please try this (against a clean nfs-utils. i.e. remove the previous >> patch). >> It is a hack and would need to be totally re-written, but hopely the >> measurements you make and strace that you report could be useful. >> Also, for the strace, please use "-ttt" rather than "-tt" like I asked >> before. It is easier to find the difference between two times with >> -ttt. And add -T as well. > > Sorry this took so long but I had to be sure of my results. It is worth being thorough. > > With your latest patch applied I am unable to mount my NFS shares and > thus continue with the tests from before. Weird. You are experiencing a bug that was very recently fixed, where if mount.nfs4 gets the error ENOENT from the server, it falls back to NFSv3. That explains some of the noise, but doesn't explain why you get ENOENT for the v4 mount. The strace output you've provided doesn't even show any attempts to read /etc/mtab, which my patch doesn't change at all. So it seems like the context is different in some way. Your nfs_test_notes.txt doesn't show /etc/export.d getting filled in ... maybe that it done automatically somehow... Can you try with unpatches 2.1.1? Also maybe provide an strace starting before any attempt to mount anything, and with an extra option "-s 1000".. Thanks, NeilBrown
On 07/26/2017 11:37 PM, NeilBrown wrote: > Weird. I thought so as well. > So it seems like the context is different in some way. It is in the sense that I am now using 1.3.4 and 2.1.1 vanilla on Ubuntu Xenial(provides 1.2.8) without any of debian's patches or niceties. > Your nfs_test_notes.txt doesn't show /etc/export.d getting filled in > ... maybe that it done automatically somehow... I copied the files from the original NFS server into /etc/exports.d as part of the VM snapshot I revert back to. My notes say 'mkdir -p /etc/exports.d', but this step is not necessary. root@nfsserver:~# ls /etc/exports.d/ | wc -l 972 I only added 3 extra files so that 'nfsclient' would be able to mount from the testing server (nfsserver). The ZFS datasets were also precreated: root@nfsserver:~# zfs list | wc -l 6062 > Can you try with unpatches 2.1.1? I assume you mean without any patches. vanilla 2.1.1 > Also maybe provide an strace starting before any attempt to mount > anything, and with an extra option "-s 1000".. This is all that happens *before* trying to mount from the client. root@nfsserver:~# systemctl restart nfs-server; exportfs -f; m=$(pgrep rpc.mountd); strace -s 1000 -ttt -T -p $m 2>&1 | tee strace_rpc.mountd2.txt strace: Process 2517 attached 1501190436.659800 select(1024, [3 4 5 7 8 9 10 11 12 13 14 15 16 17 18], NULL, NULL, NULL Stracing rpc.nfsd: root@nfsserver:~# exportfs -f; strace -s1000 -ttt -T /usr/sbin/rpc.nfsd 0 2>&1| tee strace_nfsd.txt; /usr/sbin/sm-notify http://people.cs.uchicago.edu/~kauffman/nfs-kernel-server/take3/strace_rpc.nfsd.txt When mounting from the client I get a strace that is 1.3G (it seems '-s' was not respected for some reason): http://people.cs.uchicago.edu/~kauffman/nfs-kernel-server/take3/strace_rpc.mountd.txt
On Thu, Jul 27 2017, Phil Kauffman wrote: > On 07/26/2017 11:37 PM, NeilBrown wrote: >> Weird. > I thought so as well. > >> So it seems like the context is different in some way. > It is in the sense that I am now using 1.3.4 and 2.1.1 vanilla on Ubuntu Xenial(provides 1.2.8) without any of debian's patches or niceties. > > >> Your nfs_test_notes.txt doesn't show /etc/export.d getting filled in >> ... maybe that it done automatically somehow... > I copied the files from the original NFS server into /etc/exports.d as part of the VM snapshot I revert back to. My notes say 'mkdir -p /etc/exports.d', but this step is not necessary. > > root@nfsserver:~# ls /etc/exports.d/ | wc -l > 972 > > I only added 3 extra files so that 'nfsclient' would be able to mount from the testing server (nfsserver). > > The ZFS datasets were also precreated: > > root@nfsserver:~# zfs list | wc -l > 6062 > > >> Can you try with unpatches 2.1.1? > I assume you mean without any patches. vanilla 2.1.1 > >> Also maybe provide an strace starting before any attempt to mount >> anything, and with an extra option "-s 1000".. > > This is all that happens *before* trying to mount from the client. > root@nfsserver:~# systemctl restart nfs-server; exportfs -f; m=$(pgrep rpc.mountd); strace -s 1000 -ttt -T -p $m 2>&1 | tee strace_rpc.mountd2.txt > strace: Process 2517 attached > 1501190436.659800 select(1024, [3 4 5 7 8 9 10 11 12 13 14 15 16 17 18], NULL, NULL, NULL > > Stracing rpc.nfsd: > root@nfsserver:~# exportfs -f; strace -s1000 -ttt -T /usr/sbin/rpc.nfsd 0 2>&1| tee strace_nfsd.txt; /usr/sbin/sm-notify > http://people.cs.uchicago.edu/~kauffman/nfs-kernel-server/take3/strace_rpc.nfsd.txt > > When mounting from the client I get a strace that is 1.3G (it seems '-s' was not respected for some reason): http://people.cs.uchicago.edu/~kauffman/nfs-kernel-server/take3/strace_rpc.mountd.txt > I'm sorry but I cannot get anything useful from these. It isn't crystal clear what was being tested in each case.... Could you try: 1/ build the current 'master' branch from the nfs-utils git tree, with no extra patches. 2/ Install that and perform the same test which produce the 1.3G strace file above. Provide the resulting strace of rpc.mountd. 3/ Tell me what happened when you mounted on the client. Did it work as expected? 4/ Apply the patch I provided on top of the 'master' branch but change hcreate_r(1000, &hdata); to hcreate_r(20000, &hdata); 5/ Install that and perform the same test as in step to, producing a new strace of rpc.mounted. Presumably it will be difference. 6/ Tell me what happened when you mounted on the client. Did it work as expected, or did it fail like I think it did last time. If one worked and the other didn't, I can compare the strace and try to find a difference. If both worked, then we should have useful information about how much speed up the patch provides (if any). If neither worked, then something weird is happening... NeilBrown
diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c index ca6c84f4d93d..db89a4feb7ae 100644 --- a/utils/mountd/cache.c +++ b/utils/mountd/cache.c @@ -18,6 +18,7 @@ #include <time.h> #include <netinet/in.h> #include <arpa/inet.h> +#include <sys/sysmacros.h> #include <unistd.h> #include <fcntl.h> #include <errno.h> @@ -299,7 +300,7 @@ static const long int nonblkid_filesystems[] = { 0 /* last */ }; -static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid) +static int uuid_by_path(char *path, struct statfs64 *stp, int type, size_t uuidlen, char *uuid) { /* get a uuid for the filesystem found at 'path'. * There are several possible ways of generating the @@ -334,12 +335,17 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid) const char *val; int rc; - rc = statfs64(path, &st); + if (stp) + rc = 0; + else { + stp = &st; + rc= statfs64(path, stp); + } if (type == 0 && rc == 0) { const long int *bad; for (bad = nonblkid_filesystems; *bad; bad++) { - if (*bad == st.f_type) + if (*bad == stp->f_type) break; } if (*bad == 0) @@ -347,9 +353,9 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid) } if (rc == 0 && - (st.f_fsid.__val[0] || st.f_fsid.__val[1])) + (stp->f_fsid.__val[0] || stp->f_fsid.__val[1])) snprintf(fsid_val, 17, "%08x%08x", - st.f_fsid.__val[0], st.f_fsid.__val[1]); + stp->f_fsid.__val[0], stp->f_fsid.__val[1]); else fsid_val[0] = 0; @@ -603,25 +609,64 @@ static int parse_fsid(int fsidtype, int fsidlen, char *fsid, return 0; } +#define entry FOO +#include <search.h> +struct pinfo { + unsigned int stb:1, mntp:1, stfs:1; + unsigned int mountpoint:1; + time_t valid; + struct stat statbuf; + struct statfs64 statfsbuf; +}; +static struct hsearch_data hdata; +static int hdata_init = 0; + + static bool match_fsid(struct parsed_fsid *parsed, nfs_export *exp, char *path) { - struct stat stb; + struct stat *stb; int type; char u[16]; + struct pinfo *pi; + ENTRY ent, *ret; - if (stat(path, &stb) != 0) - return false; - if (!S_ISDIR(stb.st_mode) && !S_ISREG(stb.st_mode)) + if (!hdata_init) { + hcreate_r(1000, &hdata); + hdata_init = 1; + } + ent.key = path; + if (hsearch_r(ent, FIND, &ret, &hdata) == 0) { + ent.key = strdup(path); + pi = xmalloc(sizeof(*pi)); + pi->stb = pi->mntp = pi->stfs = 0; + pi->valid = 0; + ent.data = pi; + hsearch_r(ent, ENTER, &ret, &hdata); + } else + pi = ret->data; + + if (pi->valid < time(NULL)+120) { + pi->stb = pi->mntp = pi->stfs = 0; + pi->valid = time(NULL); + } + + stb = &pi->statbuf; + if (!pi->stb) { + if (stat(path, stb) != 0) + return false; + pi->stb = 1; + } + if (!S_ISDIR(stb->st_mode) && !S_ISREG(stb->st_mode)) return false; switch (parsed->fsidtype) { case FSID_DEV: case FSID_MAJOR_MINOR: case FSID_ENCODE_DEV: - if (stb.st_ino != parsed->inode) + if (stb->st_ino != parsed->inode) return false; - if (parsed->major != major(stb.st_dev) || - parsed->minor != minor(stb.st_dev)) + if (parsed->major != major(stb->st_dev) || + parsed->minor != minor(stb->st_dev)) return false; return true; case FSID_NUM: @@ -631,12 +676,16 @@ static bool match_fsid(struct parsed_fsid *parsed, nfs_export *exp, char *path) return true; case FSID_UUID4_INUM: case FSID_UUID16_INUM: - if (stb.st_ino != parsed->inode) + if (stb->st_ino != parsed->inode) return false; goto check_uuid; case FSID_UUID8: case FSID_UUID16: - if (!is_mountpoint(path)) + if (!pi->mntp) { + pi->mountpoint = is_mountpoint(path); + pi->mntp = 1; + } + if (!pi->mountpoint) return false; check_uuid: if (exp->m_export.e_uuid) { @@ -644,12 +693,18 @@ static bool match_fsid(struct parsed_fsid *parsed, nfs_export *exp, char *path) if (memcmp(u, parsed->fhuuid, parsed->uuidlen) == 0) return true; } - else + else { + if (!pi->stfs) { + if (statfs64(path, &pi->statfsbuf) != 0) + return false; + pi->stfs = 1; + } for (type = 0; - uuid_by_path(path, type, parsed->uuidlen, u); + uuid_by_path(path, &pi->statfsbuf, type, parsed->uuidlen, u); type++) if (memcmp(u, parsed->fhuuid, parsed->uuidlen) == 0) return true; + } } /* Well, unreachable, actually: */ return false; @@ -727,10 +782,18 @@ static void nfsd_fh(int f) for (exp = exportlist[i].p_head; exp; exp = next_exp) { char *path; + if (!is_ipaddr_client(dom)) { + if (!namelist_client_matches(exp, dom)) + continue; + } else { + if (!ipaddr_client_matches(exp, ai)) + continue; + } + if (exp->m_export.e_flags & NFSEXP_CROSSMOUNT) { static nfs_export *prev = NULL; static void *mnt = NULL; - + if (prev == exp) { /* try a submount */ path = next_mnt(&mnt, exp->m_export.e_path); @@ -751,9 +814,6 @@ static void nfsd_fh(int f) next_exp = exp->m_next; } - if (!is_ipaddr_client(dom) - && !namelist_client_matches(exp, dom)) - continue; if (exp->m_export.e_mountpoint && !is_mountpoint(exp->m_export.e_mountpoint[0]? exp->m_export.e_mountpoint: @@ -762,9 +822,6 @@ static void nfsd_fh(int f) if (!match_fsid(&parsed, exp, path)) continue; - if (is_ipaddr_client(dom) - && !ipaddr_client_matches(exp, ai)) - continue; if (!found || subexport(&exp->m_export, found)) { found = &exp->m_export; free(found_path); @@ -906,7 +963,7 @@ static int dump_to_cache(int f, char *buf, int buflen, char *domain, write_secinfo(&bp, &blen, exp, flag_mask); if (exp->e_uuid == NULL || different_fs) { char u[16]; - if (uuid_by_path(path, 0, 16, u)) { + if (uuid_by_path(path, NULL, 0, 16, u)) { qword_add(&bp, &blen, "uuid"); qword_addhex(&bp, &blen, u, 16); }