Message ID | 20220217131531.2890-1-richard@nod.at (mailing list archive) |
---|---|
Headers | show |
Series | nfs-utils: Improving NFS re-exports | expand |
On Thu, Feb 17, 2022 at 02:15:25PM +0100, Richard Weinberger wrote: > This is the second iteration of the NFS re-export improvement series for nfs-utils. > While the kernel side didn't change at all and is still small, > the userspace side saw much more changes. > Please note that this is still an RFC, there is room for improvement. > > The core idea is adding new export option: reeport= > Using reexport= it is possible to mark an export entry in the exports file > explicitly as NFS re-export and select a strategy how unique identifiers > should be provided. > "remote-devfsid" is the strategy I have proposed in my first patch, > I understand that this one is dangerous. But I still find it useful in some > situations. > "auto-fsidnum" and "predefined-fsidnum" are new and use a SQLite database as > backend to keep track of generated ids. > For a more detailed description see patch "exports: Implement new export option reexport=". Thanks, I'll try to take a look. Before upstreaming, I would like us to pick just one. These kind of options tend to complicate testing and documentation and debugging. For an RFC, though, I think it makes sense, so I'm fine with keeping "reexport=" while we're still exploring the different options. And, hey, maybe we end up adding more than one after we've upstreamed the first one. > I choose SQLite because nfs-utils already uses it and using SQL ids can nicely > generated and maintained. It will also scale for large setups where the amount > of subvolumes is high. > > Beside of id generation this series also addresses the reboot problem. > If the re-exporting NFS server reboots, uncovered NFS subvolumes are not yet > mounted and file handles become stale. > Now mountd/exportd keeps track of uncovered subvolumes and makes sure they get > uncovered while nfsd starts. > > The whole set of features is currently opt-in via --enable-reexport. > I'm also not sure about the rearrangement of the reexport code, > currently it is a helper library. > > Please let me know whether you like this approach. > If so I'd tidy it up and submit it as non-RFC. > > TODOs/Open questions: > - When re-exporting, fs.nfs.nfs_mountpoint_timeout should be set to 0 > to make subvolumes not vanish. > Is this something exportfs should do automatically when it sees an export entry with a reexport= option? Setting the timeout to 0 doesn't help with re-export server reboots. After a reboot is another case where we could end up in a situation where a client hands us a filehandle for a filesystem that isn't mounted yet. I think you want to keep a path with each entry in the database. When mountd gets a request for a filesystem it hasn't seen before, it stats that path, which should trigger the automounts. And it'd be good to have a test case with a client (Linux client or pynfs) that, say, opens a file several mounts deep, then reboots the reexport server, then tries to, say, write to the file descriptor after the reboot. (Or maybe there's a way to force the mounts to expire as a shortcut instead of doing a full reboot.) > - exportd saw only minimal testing so far, I wasn't aware of it yet. :-S > - Currently wtere is no way to release the shared memory which contains the database lock. > I guess it could be released via exportfs -f, which is the very last exec in nfs-server.service > - Add a tool to import/export entries from the reexport database which obeys the shared lock. > - When doing v4->v4 or v3->v4 re-exports very first read access to a file block a few seconds until > the client does a retransmit. > v3->v3 works fine. More investigation needed. Might want to strace mountd and look at the communication over the /proc/fs/nfsd/*/channel files, maybe mountd is failing to respond to an upcall. --b. > > Looking forward for your feedback! > > Thanks, > //richard > > Richard Weinberger (6): > Implement reexport helper library > exports: Implement new export option reexport= > export: Implement logic behind reexport= > export: Record mounted volumes > nfsd: statfs() every known subvolume upon start > export: Garbage collect orphaned subvolumes upon start > > configure.ac | 12 + > support/Makefile.am | 4 + > support/export/Makefile.am | 2 + > support/export/cache.c | 241 +++++++++++++++++- > support/export/export.h | 3 + > support/include/nfslib.h | 1 + > support/nfs/Makefile.am | 1 + > support/nfs/exports.c | 73 ++++++ > support/reexport/Makefile.am | 6 + > support/reexport/reexport.c | 477 +++++++++++++++++++++++++++++++++++ > support/reexport/reexport.h | 53 ++++ > utils/exportd/Makefile.am | 8 +- > utils/exportd/exportd.c | 17 ++ > utils/exportfs/Makefile.am | 4 + > utils/mount/Makefile.am | 6 + > utils/mountd/Makefile.am | 6 + > utils/mountd/mountd.c | 1 + > utils/mountd/svc_run.c | 18 ++ > utils/nfsd/Makefile.am | 6 + > utils/nfsd/nfsd.c | 10 + > 20 files changed, 934 insertions(+), 15 deletions(-) > create mode 100644 support/reexport/Makefile.am > create mode 100644 support/reexport/reexport.c > create mode 100644 support/reexport/reexport.h > > -- > 2.31.1
Bruce, ----- Ursprüngliche Mail ----- > Von: "bfields" <bfields@fieldses.org> > An: "richard" <richard@nod.at> > CC: "linux-nfs" <linux-nfs@vger.kernel.org>, "david" <david@sigma-star.at>, "luis turcitu" > <luis.turcitu@appsbroker.com>, "david young" <david.young@appsbroker.com>, "david oberhollenzer" > <david.oberhollenzer@sigma-star.at>, "trond myklebust" <trond.myklebust@hammerspace.com>, "anna schumaker" > <anna.schumaker@netapp.com>, "chris chilvers" <chris.chilvers@appsbroker.com> > Gesendet: Donnerstag, 17. Februar 2022 17:33:32 > Betreff: Re: [RFC PATCH 0/6] nfs-utils: Improving NFS re-exports > On Thu, Feb 17, 2022 at 02:15:25PM +0100, Richard Weinberger wrote: >> This is the second iteration of the NFS re-export improvement series for >> nfs-utils. >> While the kernel side didn't change at all and is still small, >> the userspace side saw much more changes. >> Please note that this is still an RFC, there is room for improvement. >> >> The core idea is adding new export option: reeport= >> Using reexport= it is possible to mark an export entry in the exports file >> explicitly as NFS re-export and select a strategy how unique identifiers >> should be provided. >> "remote-devfsid" is the strategy I have proposed in my first patch, >> I understand that this one is dangerous. But I still find it useful in some >> situations. >> "auto-fsidnum" and "predefined-fsidnum" are new and use a SQLite database as >> backend to keep track of generated ids. >> For a more detailed description see patch "exports: Implement new export option >> reexport=". > > Thanks, I'll try to take a look. > > Before upstreaming, I would like us to pick just one. These kind of > options tend to complicate testing and documentation and debugging. > > For an RFC, though, I think it makes sense, so I'm fine with keeping > "reexport=" while we're still exploring the different options. And, > hey, maybe we end up adding more than one after we've upstreamed the > first one. Which one do you prefer? "predefined-fsidnum" should be the safest one to start. >> - When re-exporting, fs.nfs.nfs_mountpoint_timeout should be set to 0 >> to make subvolumes not vanish. >> Is this something exportfs should do automatically when it sees an export entry >> with a reexport= option? > > Setting the timeout to 0 doesn't help with re-export server reboots. > After a reboot is another case where we could end up in a situation > where a client hands us a filehandle for a filesystem that isn't mounted > yet. > > I think you want to keep a path with each entry in the database. When > mountd gets a request for a filesystem it hasn't seen before, it stats > that path, which should trigger the automounts. I have implemented that already. This feature is part of this series. :-) > And it'd be good to have a test case with a client (Linux client or > pynfs) that, say, opens a file several mounts deep, then reboots the > reexport server, then tries to, say, write to the file descriptor after > the reboot. (Or maybe there's a way to force the mounts to expire as a > shortcut instead of doing a full reboot.) Okay, I'll test that. >> - exportd saw only minimal testing so far, I wasn't aware of it yet. :-S >> - Currently wtere is no way to release the shared memory which contains the >> database lock. >> I guess it could be released via exportfs -f, which is the very last exec in >> nfs-server.service >> - Add a tool to import/export entries from the reexport database which obeys the >> shared lock. >> - When doing v4->v4 or v3->v4 re-exports very first read access to a file block >> a few seconds until >> the client does a retransmit. >> v3->v3 works fine. More investigation needed. > > Might want to strace mountd and look at the communication over the > /proc/fs/nfsd/*/channel files, maybe mountd is failing to respond to an > upcall. Ahh. The upcall might be the issue. Thanks for the pointer. Thanks, //richard
On Thu, Feb 17, 2022 at 06:27:15PM +0100, Richard Weinberger wrote: > > Von: "bfields" <bfields@fieldses.org> > > Thanks, I'll try to take a look. > > > > Before upstreaming, I would like us to pick just one. These kind of > > options tend to complicate testing and documentation and debugging. > > > > For an RFC, though, I think it makes sense, so I'm fine with keeping > > "reexport=" while we're still exploring the different options. And, > > hey, maybe we end up adding more than one after we've upstreamed the > > first one. > > Which one do you prefer? > "predefined-fsidnum" should be the safest one to start. I don't know! I'll have to do some more reading and think about it. > > Setting the timeout to 0 doesn't help with re-export server reboots. > > After a reboot is another case where we could end up in a situation > > where a client hands us a filehandle for a filesystem that isn't mounted > > yet. > > > > I think you want to keep a path with each entry in the database. When > > mountd gets a request for a filesystem it hasn't seen before, it stats > > that path, which should trigger the automounts. > > I have implemented that already. This feature is part of this series. :-) Oh, good, got it. It'll take me some time to catch up. --b.
----- Ursprüngliche Mail ----- > Von: "bfields" <bfields@fieldses.org> >> Which one do you prefer? >> "predefined-fsidnum" should be the safest one to start. > > I don't know! I'll have to do some more reading and think about it. No need to worry, take your time. >> > Setting the timeout to 0 doesn't help with re-export server reboots. >> > After a reboot is another case where we could end up in a situation >> > where a client hands us a filehandle for a filesystem that isn't mounted >> > yet. >> > >> > I think you want to keep a path with each entry in the database. When >> > mountd gets a request for a filesystem it hasn't seen before, it stats >> > that path, which should trigger the automounts. >> >> I have implemented that already. This feature is part of this series. :-) > > Oh, good, got it. It'll take me some time to catch up. The reason why setting the timeout to 0 is still needed is because when mountd uncovers a subvolume but no client uses it a filehandle, it is not locked and can be unmounted later. Only when nfsd sees a matching filehandle the reference counter will be increased and umounting is no longer possible. Thanks, //richard
On Thu, Feb 17, 2022 at 09:15:38PM +0100, Richard Weinberger wrote: > ----- Ursprüngliche Mail ----- > > Von: "bfields" <bfields@fieldses.org> > >> Which one do you prefer? > >> "predefined-fsidnum" should be the safest one to start. > > > > I don't know! I'll have to do some more reading and think about it. > > No need to worry, take your time. > > >> > Setting the timeout to 0 doesn't help with re-export server reboots. > >> > After a reboot is another case where we could end up in a situation > >> > where a client hands us a filehandle for a filesystem that isn't mounted > >> > yet. > >> > > >> > I think you want to keep a path with each entry in the database. When > >> > mountd gets a request for a filesystem it hasn't seen before, it stats > >> > that path, which should trigger the automounts. > >> > >> I have implemented that already. This feature is part of this series. :-) > > > > Oh, good, got it. It'll take me some time to catch up. > > The reason why setting the timeout to 0 is still needed is because > when mountd uncovers a subvolume but no client uses it a filehandle, > it is not locked and can be unmounted later. > Only when nfsd sees a matching filehandle the reference counter will > be increased and umounting is no longer possible. I understand that. But, then if a client comes in with a matching filehandle, mountd should be able to look up the filesystem and trigger a new mount, right? I can imagine that setting the timeout to 0 might be an optimization, but I'm not seeing why it's required for correct behavior. --b.
----- Ursprüngliche Mail ----- > Von: "bfields" <bfields@fieldses.org> >> The reason why setting the timeout to 0 is still needed is because >> when mountd uncovers a subvolume but no client uses it a filehandle, >> it is not locked and can be unmounted later. >> Only when nfsd sees a matching filehandle the reference counter will >> be increased and umounting is no longer possible. > > I understand that. But, then if a client comes in with a matching > filehandle, mountd should be able to look up the filesystem and trigger > a new mount, right? This does not match what I saw in my experiments. The handle machted only when the subvolume was mounted before. But I understand now your point, in theory it should work. I'll investigate into this. Thanks, //richard
Bruce, ----- Ursprüngliche Mail ----- > Von: "bfields" <bfields@fieldses.org> > On Thu, Feb 17, 2022 at 02:15:25PM +0100, Richard Weinberger wrote: >> This is the second iteration of the NFS re-export improvement series for >> nfs-utils. >> While the kernel side didn't change at all and is still small, >> the userspace side saw much more changes. >> Please note that this is still an RFC, there is room for improvement. >> >> The core idea is adding new export option: reeport= >> Using reexport= it is possible to mark an export entry in the exports file >> explicitly as NFS re-export and select a strategy how unique identifiers >> should be provided. >> "remote-devfsid" is the strategy I have proposed in my first patch, >> I understand that this one is dangerous. But I still find it useful in some >> situations. >> "auto-fsidnum" and "predefined-fsidnum" are new and use a SQLite database as >> backend to keep track of generated ids. >> For a more detailed description see patch "exports: Implement new export option >> reexport=". > > Thanks, I'll try to take a look. Did you find some cycles to think about which approach you prefer? Thanks, //richard
On Mon, Mar 07, 2022 at 10:25:52AM +0100, Richard Weinberger wrote:
> Did you find some cycles to think about which approach you prefer?
I haven't. I'll try to take a look in the next couple days. Thanks for
the reminder.
--b.
On 3/7/22 5:29 PM, bfields wrote: > On Mon, Mar 07, 2022 at 10:25:52AM +0100, Richard Weinberger wrote: >> Did you find some cycles to think about which approach you prefer? > > I haven't. I'll try to take a look in the next couple days. Thanks for > the reminder. Is there any movement on this?? I would be good to have some for the upcoming Bakeathon... Next week. steved.
----- Ursprüngliche Mail ----- > On 3/7/22 5:29 PM, bfields wrote: >> On Mon, Mar 07, 2022 at 10:25:52AM +0100, Richard Weinberger wrote: >>> Did you find some cycles to think about which approach you prefer? >> >> I haven't. I'll try to take a look in the next couple days. Thanks for >> the reminder. > > Is there any movement on this?? I would be good to have some > for the upcoming Bakeathon... Next week. I'm in the middle of preparing the next version of the series. Thanks, //richard