Message ID | 87lgpkgwrw.fsf@notabene.neil.brown.name (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 26.05.17 12:46, NeilBrown (neilb@suse.com) wrote: > > Hi all, > it appears that systemd doesn't play well with NFS "bg" mounts. > I can see a few options for how to address this and wonder if anyone > has any opinions. Yeah, this has come up before. Long story short: "bg" is simply not compatible with systemd. We assume that the /bin/mount's effect is visible in /proc/self/mountinfo, and everything else is considered a bug, i.e. /bin/mount lying to us. And I think that's a pretty rational assumption and requirement to make. I am not particularly convinced the "bg" usecase is really such a great idea, since it is necessarily racy: you never know whether mount is actually in effect or not, even though /bin/mount claims so. I am pretty sure other options (such as autofs mounts, which are dead-easy to use in system: just replace the "bg" mount option in fstab by "x-systemd.automount") are much better approaches to the problem at hand: they also make your local system less dependent on remote network access, but they do give proper guarantees about their runtime: when the autofs mount is established the path is available. Hence I am tempted to treat the issue as a documentation and warning issue: accept that "bg" is not supported, but document this better. In addition, we should probably log about "bg" being broken in the fstab-generator. I file a bug about that now: https://github.com/systemd/systemd/issues/6046 > This is better, but the background mount.nfs can persist for a long > time. I don't think it persists forever, but at least half an hour I > think. > > When the foo.mount unit is stopped, the mount.nfs process isn't > killed. Hmm, if you see this, then this would be a bug: mount units that are stopped should not leave processes around. > I don't think this is a major problem, but it is unfortunate and could > be confusing. During testing I've had multiple mount.nfs background > processes all attached to the one .mount unit. Humpf, could you file a bug? While I think the "bg" concept is broken, as discussed above, having FUSE mounts with processes in the background is actually supported, and we should clean them up properly when the mount unit is stopped. Hmm, maybe mount.nfs isn't properly killable? i.e. systemd tries to kill it, but it refuses to be killed? > What should we do about bg NFS mounts with systemd? > Some options: > - declare "bg" to be not-supported. If you don't need the filesystem > to be present for boot, then use x-systemd.automount, or some other > automount scheme. > If we did this, we probably need to make it very obvious that "bg" > mounts aren't supported - maybe a log message that appears when > you do "systemctl status ..." ?? I am all for this, as suggested above. I'd only log from fstab-generator though. (That said, if we want something stronger, we could also add the fact that we stripped "bg" from the mount optoins to the DEscription= of the generated mount unit.) > - decide that "bg" is really just a lame attempt at automounting, and > that now we have real automounting, "bg" can be translated to that. > So systemd-fstab-generator would treat "bg" like > "x-systemd.automount" and otherwise strip it from the list of > options. I am a bit afraid of this I must say. The semantics are different enough to likely cause more problems then we'd solve with this. Not supporting this at all sounds like the much better approach here: let's strip "bg" when specified. > - do our best to really support "bg". That means, at least, applying > a much larger timeout to "bg" mounts, and preferably killing any > background processes when a mount unit is stopped. Below is a > little patch which does this last bit, but I'm not sure it is generally > safe. As mentioned I think this would just trade one race for a couple of new ones, and that appears to be a bad idea to me. > A side question is: should this knowledge about NFS be encoded in > systemd, or should nfs-utils add the necessary knowledge? I am pretty sure we should keep special understanding of NFS at a minimum in PID 1, but I think we can be less strict in fstab-generator, as its primary job is compat with UNIX anyway. > > i.e. we could add an nfs-fstab-generator to nfs-utils which creates > drop-ins to modify the behaviour of the drop-ins provided by > systemd-fstab-generator. > Adding the TimeoutSec= would be easy. Stripping the "bg" would be > possible. > Changing the remote-fs.target.requires/foo.mount symlink to be > remote-fs.target.requires/foo.automount would be problematic > though. Well, I'd be fine with letting NFS do its own handling of the NFS /etc/fstab entries, but I think the special casing of "bg" is fine to simply add to the existing generator in systemd. > Could we teach systemd-fstab-generator to ignore $TYPE filesystems > if TYPE-fstab-generator existed? Well, generators can override each other in very limited ways (as there are three different output directories a gnerator can write to, which are inserted at different places in the unit file search path), we could build on that. That said, I think adding this to fstab-generator in systemd is OK. > Or should we just build all this filesystem-specific knowledge into > systemd? For now, I think adding this to systemd's fstab-generator would be fine. Hope this makes sense, Lennart
On Mon, May 29 2017, Lennart Poettering wrote: > On Fri, 26.05.17 12:46, NeilBrown (neilb@suse.com) wrote: > >> >> Hi all, >> it appears that systemd doesn't play well with NFS "bg" mounts. >> I can see a few options for how to address this and wonder if anyone >> has any opinions. > > Yeah, this has come up before. Long story short: "bg" is simply not > compatible with systemd. We assume that the /bin/mount's effect is > visible in /proc/self/mountinfo, and everything else is considered a > bug, i.e. /bin/mount lying to us. And I think that's a pretty rational > assumption and requirement to make. > > I am not particularly convinced the "bg" usecase is really such a > great idea, since it is necessarily racy: you never know whether mount > is actually in effect or not, even though /bin/mount claims so. I am > pretty sure other options (such as autofs mounts, which are dead-easy > to use in system: just replace the "bg" mount option in fstab by > "x-systemd.automount") are much better approaches to the problem at > hand: they also make your local system less dependent on remote > network access, but they do give proper guarantees about their > runtime: when the autofs mount is established the path is available. > > Hence I am tempted to treat the issue as a documentation and warning > issue: accept that "bg" is not supported, but document this better. In > addition, we should probably log about "bg" being broken in the > fstab-generator. I file a bug about that now: > > https://github.com/systemd/systemd/issues/6046 There is a weird distorted sort of justice here. When NFS first appeared, it broke various long-standing Unix practices, such as O_EXCL|O_CREAT for lock files. Now systemd appears and breaks a long-standing NFS practice: bg mounts. I hoped we could find a way to make them work, but I won't cry over their demise. I much prefer automount .... I think all NFS mounts should be automounts. I see this is already documented in systemd.mount: The NFS mount option bg for NFS background mounts as documented in nfs(5) is not supported in /etc/fstab entries. I wonder how many people actually read that. We should probably add symmetric documentation to nfs(5) Both should give clear pointers to x-systemd.automount. > >> This is better, but the background mount.nfs can persist for a long >> time. I don't think it persists forever, but at least half an hour I >> think. >> >> When the foo.mount unit is stopped, the mount.nfs process isn't >> killed. > > Hmm, if you see this, then this would be a bug: mount units that are > stopped should not leave processes around. > >> I don't think this is a major problem, but it is unfortunate and could >> be confusing. During testing I've had multiple mount.nfs background >> processes all attached to the one .mount unit. > > Humpf, could you file a bug? https://github.com/systemd/systemd/issues/6048 > > While I think the "bg" concept is broken, as discussed above, having > FUSE mounts with processes in the background is actually supported, > and we should clean them up properly when the mount unit is stopped. > > Hmm, maybe mount.nfs isn't properly killable? i.e. systemd tries to > kill it, but it refuses to be killed? mount.nfs responds cleanly to SIGTERM. > >> What should we do about bg NFS mounts with systemd? >> Some options: >> - declare "bg" to be not-supported. If you don't need the filesystem >> to be present for boot, then use x-systemd.automount, or some other >> automount scheme. >> If we did this, we probably need to make it very obvious that "bg" >> mounts aren't supported - maybe a log message that appears when >> you do "systemctl status ..." ?? > > I am all for this, as suggested above. I'd only log from > fstab-generator though. (That said, if we want something stronger, we > could also add the fact that we stripped "bg" from the mount optoins > to the DEscription= of the generated mount unit.) That last bit sounds like a very good idea. Stripping "bg" could be seen as a "surprising" thing for fstab-generator to do. Making it as obvious as possible to the sys-admin would be a good thing (and would probably make support personnel happy too). > >> - decide that "bg" is really just a lame attempt at automounting, and >> that now we have real automounting, "bg" can be translated to that. >> So systemd-fstab-generator would treat "bg" like >> "x-systemd.automount" and otherwise strip it from the list of >> options. > > I am a bit afraid of this I must say. The semantics are different > enough to likely cause more problems then we'd solve with this. Not > supporting this at all sounds like the much better approach here: > let's strip "bg" when specified. > >> - do our best to really support "bg". That means, at least, applying >> a much larger timeout to "bg" mounts, and preferably killing any >> background processes when a mount unit is stopped. Below is a >> little patch which does this last bit, but I'm not sure it is generally >> safe. > > As mentioned I think this would just trade one race for a couple of > new ones, and that appears to be a bad idea to me. > >> A side question is: should this knowledge about NFS be encoded in >> systemd, or should nfs-utils add the necessary knowledge? > > I am pretty sure we should keep special understanding of NFS at a > minimum in PID 1, but I think we can be less strict in > fstab-generator, as its primary job is compat with UNIX anyway. I was thinking about which source package the knowledge would be in, and hence which set of maintainers would over-see it. I don't expect systemd maintainers to be fully in-touch with the details of NFS, but then NFS developers cannot be fully in-touch with how systemd works. Apart from some documentation changes, we probably don't need to put anything new in nfs-utils at the moment. > >> >> i.e. we could add an nfs-fstab-generator to nfs-utils which creates >> drop-ins to modify the behaviour of the drop-ins provided by >> systemd-fstab-generator. >> Adding the TimeoutSec= would be easy. Stripping the "bg" would be >> possible. >> Changing the remote-fs.target.requires/foo.mount symlink to be >> remote-fs.target.requires/foo.automount would be problematic >> though. > > Well, I'd be fine with letting NFS do its own handling of the NFS > /etc/fstab entries, but I think the special casing of "bg" is fine to > simply add to the existing generator in systemd. > >> Could we teach systemd-fstab-generator to ignore $TYPE filesystems >> if TYPE-fstab-generator existed? > > Well, generators can override each other in very limited ways (as > there are three different output directories a gnerator can write to, > which are inserted at different places in the unit file search path), > we could build on that. That said, I think adding this to > fstab-generator in systemd is OK. Ahh, of course. That is what normal-dir / early-dir / late-dir is for. I'll keep that in mind in case I do ever need it. > >> Or should we just build all this filesystem-specific knowledge into >> systemd? > > For now, I think adding this to systemd's fstab-generator would be fine. > > Hope this makes sense, Yes it does. It wasn't the outcome I was hoping for, but it is hard to argue against it. Thanks a lot, NeilBrown
diff --git a/src/core/mount.c b/src/core/mount.c index ca0c4b0d5eed..91939b48d11a 100644 --- a/src/core/mount.c +++ b/src/core/mount.c @@ -883,6 +883,18 @@ static void mount_enter_unmounting(Mount *m) { MOUNT_UNMOUNTING_SIGKILL)) m->n_retry_umount = 0; + if (m->result == MOUNT_SUCCESS && + !m->from_proc_self_mountinfo) { + /* There is no mountpoint, but mount seemed to succeed. + * Could be a bg mount.nfs. + * In any case, kill any processes that might be hanging + * around, they cannot be doing anything useful. + */ + sd_bus_error error = SD_BUS_ERROR_NULL; + unit_kill_common(UNIT(m), KILL_ALL, SIGTERM, -1, -1, &error); + } + + m->control_command_id = MOUNT_EXEC_UNMOUNT; m->control_command = m->exec_command + MOUNT_EXEC_UNMOUNT;