Message ID | 87lgtn167n.fsf@xmission.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, Feb 04, 2017 at 07:26:20AM +1300, Eric W. Biederman wrote: > Ram Pai <linuxram@us.ibm.com> writes: > > > On Fri, Feb 03, 2017 at 11:54:21PM +1300, Eric W. Biederman wrote: > >> ebiederm@xmission.com (Eric W. Biederman) writes: > >> > >> > Ram Pai <linuxram@us.ibm.com> writes: > >> > > >> >> On Sat, Jan 21, 2017 at 05:15:29PM +1300, Eric W. Biederman wrote: > >> >>> Ram Pai <linuxram@us.ibm.com> writes: > >> >>> > >> >>> >> @@ -359,12 +373,24 @@ int propagate_mount_busy(struct mount *mnt, int refcnt) > >> >>> >> > >> >>> >> for (m = propagation_next(parent, parent); m; > >> >>> >> m = propagation_next(m, parent)) { > >> >>> >> - child = __lookup_mnt_last(&m->mnt, mnt->mnt_mountpoint); > >> >>> >> - if (child && list_empty(&child->mnt_mounts) && > >> >>> >> - (ret = do_refcount_check(child, 1))) > >> >>> >> - break; > >> >>> >> + int count = 1; > >> >>> >> + child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint); > >> >>> >> + if (!child) > >> >>> >> + continue; > >> >>> >> + > >> >>> >> + /* Is there exactly one mount on the child that covers > >> >>> >> + * it completely whose reference should be ignored? > >> >>> >> + */ > >> >>> >> + topper = find_topper(child); > >> >>> > > >> >>> > This is tricky. I understand it is trying to identify the case where a > >> >>> > mount got tucked-in because of propagation. But this will not > >> >>> > distinguish the case where a mount got over-mounted genuinely, not because of > >> >>> > propagation, but because of explicit user action. > >> >>> > > >> >>> > > >> >>> > example: > >> >>> > > >> >>> > case 1: (explicit user action) > >> >>> > B is a slave of A > >> >>> > mount something on A/a , it will propagate to B/a > >> >>> > and than mount something on B/a > >> >>> > > >> >>> > case 2: (tucked mount) > >> >>> > B is a slave of A > >> >>> > mount something on B/a > >> >>> > and than mount something on A/a > >> >>> > > >> >>> > Both case 1 and case 2 lead to the same mount configuration. > >> >>> > > >> >>> > > >> >>> > however 'umount A/a' in case 1 should fail. > >> >>> > and 'umount A/a' in case 2 should pass. > >> >>> > > >> >>> > Right? in other words, umounts of 'tucked mounts' should pass(case 2). > >> >>> > whereas umounts of mounts on which overmounts exist should > >> >>> > fail.(case 1) > >> >>> > >> >>> Looking at your example. I agree that case 1 will fail today. > >> >> > >> >> And should continue to fail. right? Your semantics change will pass it. > >> > > >> > I don't see why it should continue to fail. > >> > > >> >>> However my actual expectation would be for both mount configurations > >> >>> to behave the same. In both cases something has been explicitly mounted > >> >>> on B/a and something has propagated to B/a. In both cases the mount > >> >>> on top is what was explicitly mounted, and the mount below is what was > >> >>> propagated to B/a. > >> >>> > >> >>> I don't see why the order of operations should matter. > >> >> > >> >> One of the subtle expectation is reversibility. > >> >> > >> >> Mount followed immediately by unmount has always passed and that is the > >> >> standard expectation always. Your proposed code will ensure that. > >> >> > >> >> However there is one other subtle expectaton. > >> >> > >> >> A mount cannot disappear if a user has explicitly mounted on top of it. > >> >> > >> >> your proposed code will not meet that expectation. > >> >> > >> >> In other words, these two expectations make it behave differently even > >> >> when; arguably, they feel like the same configuration. > >> > > >> > I am not seeing that. > >> > > >> > > >> > > >> >>> > >> >>> > maybe we need a flag to identify tucked mounts? > >> >>> > >> >>> To preserve our exact current semantics yes. > >> >>> > >> >>> The mount configurations that are delibearately constructed that I am > >> >>> aware of are comparatively simple. I don't think anyone has even taken > >> >>> advantage of the shadow/side mounts at this point. I made a reasonable > >> >>> effort to find out and no one was even aware they existed. Much less > >> >>> what they were. And certainly no one I talked to could find code that > >> >>> used them. > >> >> > >> >> But someday; even if its after a decade, someone ;) will > >> >> stumble into this semantics and wonder 'why?'. Its better to get it right > >> >> sooner. Sorry, I am blaming myself; for keeping some of the problems > >> >> open thinking no one will bump into them. > >> > > >> > Oh definitely. If we have people ready to talk it through I am happy to > >> > dot as many i's and cross as many t's as we productively can. > >> > > >> > I was just pointing out that I don't have any reason to expect that any > >> > one depends on the subtle details of the implementation today so we > >> > still have some wiggle room to fix them. Even if they are visible to > >> > user space. > >> > >> So I haven't seen a reply, and we are getting awfully close to the merge > >> window. Is there anything concrete we can do to ease concerns? > >> > >> Right now I am thinking my last version of the patch is the likely the > >> best we have time and energy to manage and it would be good to merge it > >> before the code bit rots. > > > > I was waiting for some other opinions on the behavior, since I > > continue to think that 'one should not be able to unmount mounts on > > which a user has explicitly mounted upon'. I am happy to be overruled, > > since your patch significantly improves the rest of the semantics. > > > > Viro? > > Ram Pai, just to be clear you were hoping to add the logic below to my patch? Yes. the behavior of your patch below is what I was proposing. > > My objections to the snippet below are: > > - It makes it hard for the CRIU folks (yet more state they have to find > and restore). true. unfortunately one more subtle detail to be aware off. > > - It feels subjectively worse to me. > > - We already have cases where mounts are unmounted transparently (umount on rmdir). sorry. i am not aware of this case. some details will help. > > - Al Viro claims that the side/shadow mounts are ordinary mounts and > maintaining this extra logic that remembers if we tucked one mount > under another seems to make this them less ordinary. I tend to argue that they are a bit more than ordinary, for they have the ability to tuck. > > - The symmetry for unmounting exists for a tucked mount. We can unmount > it via propagation or we can unmount the mount above it, and then we > can unmount the new underlying mount. this is fine with me. > So I don't see why we don't > want symmetry in the other case just because we mounted on top of > the mount and rather than had the mount tucked under us. A tucked mount should be un-tuckable. I agree. But a non-tucked mount cannot pretend to be tucked and this is where I disagree. > > diff --git a/fs/namespace.c b/fs/namespace.c > index 8bfad42c1ccf..8b00e0548438 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2047,8 +2047,10 @@ static int attach_recursive_mnt(struct mount *source_mnt, > hlist_del_init(&child->mnt_hash); > q = __lookup_mnt(&child->mnt_parent->mnt, > child->mnt_mountpoint); > - if (q) > + if (q) { > mnt_change_mountpoint(child, smp, q); > + child->mnt.mnt_flags |= MNT_TUCKED; > + } > commit_tree(child); > } > put_mountpoint(smp); > diff --git a/fs/pnode.c b/fs/pnode.c > index 5bc7896d122a..e2a6ac68feb9 100644 > --- a/fs/pnode.c > +++ b/fs/pnode.c > @@ -327,6 +327,9 @@ static struct mount *find_topper(struct mount *mnt) > /* If there is exactly one mount covering mnt completely return it. */ > struct mount *child; > > + if (!(mnt->mnt.mnt_flags & MNT_TUCKED)) > + return NULL; > + > if (!list_is_singular(&mnt->mnt_mounts)) > return NULL; > > diff --git a/include/linux/mount.h b/include/linux/mount.h > index 8e0352af06b7..25ca398b19b3 100644 > --- a/include/linux/mount.h > +++ b/include/linux/mount.h > @@ -52,6 +52,7 @@ struct mnt_namespace; > > #define MNT_INTERNAL 0x4000 > > +#define MNT_TUCKED 0x020000 > #define MNT_LOCK_ATIME 0x040000 > #define MNT_LOCK_NOEXEC 0x080000 > #define MNT_LOCK_NOSUID 0x100000 > > Eric
Ram Pai <linuxram@us.ibm.com> writes: > On Sat, Feb 04, 2017 at 07:26:20AM +1300, Eric W. Biederman wrote: >> Ram Pai <linuxram@us.ibm.com> writes: >> >> > On Fri, Feb 03, 2017 at 11:54:21PM +1300, Eric W. Biederman wrote: >> >> ebiederm@xmission.com (Eric W. Biederman) writes: >> >> >> >> > Ram Pai <linuxram@us.ibm.com> writes: >> >> > >> >> >> On Sat, Jan 21, 2017 at 05:15:29PM +1300, Eric W. Biederman wrote: >> >> >>> Ram Pai <linuxram@us.ibm.com> writes: >> >> >>> >> >> >>> >> @@ -359,12 +373,24 @@ int propagate_mount_busy(struct mount *mnt, int refcnt) >> >> >>> >> >> >> >>> >> for (m = propagation_next(parent, parent); m; >> >> >>> >> m = propagation_next(m, parent)) { >> >> >>> >> - child = __lookup_mnt_last(&m->mnt, mnt->mnt_mountpoint); >> >> >>> >> - if (child && list_empty(&child->mnt_mounts) && >> >> >>> >> - (ret = do_refcount_check(child, 1))) >> >> >>> >> - break; >> >> >>> >> + int count = 1; >> >> >>> >> + child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint); >> >> >>> >> + if (!child) >> >> >>> >> + continue; >> >> >>> >> + >> >> >>> >> + /* Is there exactly one mount on the child that covers >> >> >>> >> + * it completely whose reference should be ignored? >> >> >>> >> + */ >> >> >>> >> + topper = find_topper(child); >> >> >>> > >> >> >>> > This is tricky. I understand it is trying to identify the case where a >> >> >>> > mount got tucked-in because of propagation. But this will not >> >> >>> > distinguish the case where a mount got over-mounted genuinely, not because of >> >> >>> > propagation, but because of explicit user action. >> >> >>> > >> >> >>> > >> >> >>> > example: >> >> >>> > >> >> >>> > case 1: (explicit user action) >> >> >>> > B is a slave of A >> >> >>> > mount something on A/a , it will propagate to B/a >> >> >>> > and than mount something on B/a >> >> >>> > >> >> >>> > case 2: (tucked mount) >> >> >>> > B is a slave of A >> >> >>> > mount something on B/a >> >> >>> > and than mount something on A/a >> >> >>> > >> >> >>> > Both case 1 and case 2 lead to the same mount configuration. >> >> >>> > >> >> >>> > >> >> >>> > however 'umount A/a' in case 1 should fail. >> >> >>> > and 'umount A/a' in case 2 should pass. >> >> >>> > >> >> >>> > Right? in other words, umounts of 'tucked mounts' should pass(case 2). >> >> >>> > whereas umounts of mounts on which overmounts exist should >> >> >>> > fail.(case 1) >> >> >>> >> >> >>> Looking at your example. I agree that case 1 will fail today. >> >> >> >> >> >> And should continue to fail. right? Your semantics change will pass it. >> >> > >> >> > I don't see why it should continue to fail. >> >> > >> >> >>> However my actual expectation would be for both mount configurations >> >> >>> to behave the same. In both cases something has been explicitly mounted >> >> >>> on B/a and something has propagated to B/a. In both cases the mount >> >> >>> on top is what was explicitly mounted, and the mount below is what was >> >> >>> propagated to B/a. >> >> >>> >> >> >>> I don't see why the order of operations should matter. >> >> >> >> >> >> One of the subtle expectation is reversibility. >> >> >> >> >> >> Mount followed immediately by unmount has always passed and that is the >> >> >> standard expectation always. Your proposed code will ensure that. >> >> >> >> >> >> However there is one other subtle expectaton. >> >> >> >> >> >> A mount cannot disappear if a user has explicitly mounted on top of it. >> >> >> >> >> >> your proposed code will not meet that expectation. >> >> >> >> >> >> In other words, these two expectations make it behave differently even >> >> >> when; arguably, they feel like the same configuration. >> >> > >> >> > I am not seeing that. >> >> > >> >> > >> >> > >> >> >>> >> >> >>> > maybe we need a flag to identify tucked mounts? >> >> >>> >> >> >>> To preserve our exact current semantics yes. >> >> >>> >> >> >>> The mount configurations that are delibearately constructed that I am >> >> >>> aware of are comparatively simple. I don't think anyone has even taken >> >> >>> advantage of the shadow/side mounts at this point. I made a reasonable >> >> >>> effort to find out and no one was even aware they existed. Much less >> >> >>> what they were. And certainly no one I talked to could find code that >> >> >>> used them. >> >> >> >> >> >> But someday; even if its after a decade, someone ;) will >> >> >> stumble into this semantics and wonder 'why?'. Its better to get it right >> >> >> sooner. Sorry, I am blaming myself; for keeping some of the problems >> >> >> open thinking no one will bump into them. >> >> > >> >> > Oh definitely. If we have people ready to talk it through I am happy to >> >> > dot as many i's and cross as many t's as we productively can. >> >> > >> >> > I was just pointing out that I don't have any reason to expect that any >> >> > one depends on the subtle details of the implementation today so we >> >> > still have some wiggle room to fix them. Even if they are visible to >> >> > user space. >> >> >> >> So I haven't seen a reply, and we are getting awfully close to the merge >> >> window. Is there anything concrete we can do to ease concerns? >> >> >> >> Right now I am thinking my last version of the patch is the likely the >> >> best we have time and energy to manage and it would be good to merge it >> >> before the code bit rots. >> > >> > I was waiting for some other opinions on the behavior, since I >> > continue to think that 'one should not be able to unmount mounts on >> > which a user has explicitly mounted upon'. I am happy to be overruled, >> > since your patch significantly improves the rest of the semantics. >> > >> > Viro? >> >> Ram Pai, just to be clear you were hoping to add the logic below to my patch? > > Yes. the behavior of your patch below is what I was proposing. > >> >> My objections to the snippet below are: >> >> - It makes it hard for the CRIU folks (yet more state they have to find >> and restore). > > true. unfortunately one more subtle detail to be aware off. A bit more than that, as it means that it requires an almost exact playback of the sequence of mounts in all mount namespaces to get to the point of reproducing a mount namespace. >> - It feels subjectively worse to me. >> >> - We already have cases where mounts are unmounted transparently (umount on rmdir). > > sorry. i am not aware of this case. some details will help. The question: What happens when we rmdir a directory that has a mount on it in another mount namespace? What happens when someone on the nfs server deletes a directory there is a mount on? It used to be that we returned -EBUSY, and refused the rmdir operation, and we lied in the vfs about the nfs dentry being deleted to preserve the mount. In recent kernels I have done the work so that we transparently unmount the mounts and allow the rmdir to happen. An unprivileged user mounting over say glibc and blocking the yum update of it is a pretty serious bug. >> - Al Viro claims that the side/shadow mounts are ordinary mounts and >> maintaining this extra logic that remembers if we tucked one mount >> under another seems to make this them less ordinary. > > I tend to argue that they are a bit more than ordinary, for they have the > ability to tuck. > >> >> - The symmetry for unmounting exists for a tucked mount. We can unmount >> it via propagation or we can unmount the mount above it, and then we >> can unmount the new underlying mount. > > this is fine with me. > >> So I don't see why we don't >> want symmetry in the other case just because we mounted on top of >> the mount and rather than had the mount tucked under us. > > A tucked mount should be un-tuckable. I agree. But a non-tucked mount > cannot pretend to be tucked and this is where I disagree. I have always seen the question as: Should a mount that is propagated be unmountable via umount propagation. Which leads me to think that allowing the umount propagation when it won't change the applications view of files and filesystems is a good thing. From my perspective it also better preserves the reversability property that is important. The mount propgated and now the unmount propagated. From a system management point of view one of the largest practical problems with mount namespaces and mount propagation is: mounts that propagate into another mount namespaces but don't get unmounted. Which is to say not unmounting something (especially silently) and leaving the filesystem busy when something could be unmounted is a practical problem for people. I am going to be out for a week, and I am leaving in a few minutes. So I am going to push my patch to the my for-next branch, so there is a reasonable chance of merging things when the merge window opens. If the feedback is to add the MNT_TUCKED annotations to make the patch suitable for merging to Linus's tree I will take care of that when I get back. Eric
On Sat, Feb 04, 2017 at 09:58:39AM +1300, Eric W. Biederman wrote: > Ram Pai <linuxram@us.ibm.com> writes: > > > On Sat, Feb 04, 2017 at 07:26:20AM +1300, Eric W. Biederman wrote: > >> Ram Pai <linuxram@us.ibm.com> writes: > >> > >> > On Fri, Feb 03, 2017 at 11:54:21PM +1300, Eric W. Biederman wrote: > >> >> ebiederm@xmission.com (Eric W. Biederman) writes: > >> >> > >> >> > Ram Pai <linuxram@us.ibm.com> writes: > >> >> > > >> >> >> On Sat, Jan 21, 2017 at 05:15:29PM +1300, Eric W. Biederman wrote: > >> >> >>> Ram Pai <linuxram@us.ibm.com> writes: > >> >> >>> > >> >> >>> >> @@ -359,12 +373,24 @@ int propagate_mount_busy(struct mount *mnt, int refcnt) > >> >> >>> >> > >> >> >>> >> for (m = propagation_next(parent, parent); m; > >> >> >>> >> m = propagation_next(m, parent)) { > >> >> >>> >> - child = __lookup_mnt_last(&m->mnt, mnt->mnt_mountpoint); > >> >> >>> >> - if (child && list_empty(&child->mnt_mounts) && > >> >> >>> >> - (ret = do_refcount_check(child, 1))) > >> >> >>> >> - break; > >> >> >>> >> + int count = 1; > >> >> >>> >> + child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint); > >> >> >>> >> + if (!child) > >> >> >>> >> + continue; > >> >> >>> >> + > >> >> >>> >> + /* Is there exactly one mount on the child that covers > >> >> >>> >> + * it completely whose reference should be ignored? > >> >> >>> >> + */ > >> >> >>> >> + topper = find_topper(child); > >> >> >>> > > >> >> >>> > This is tricky. I understand it is trying to identify the case where a > >> >> >>> > mount got tucked-in because of propagation. But this will not > >> >> >>> > distinguish the case where a mount got over-mounted genuinely, not because of > >> >> >>> > propagation, but because of explicit user action. > >> >> >>> > > >> >> >>> > > >> >> >>> > example: > >> >> >>> > > >> >> >>> > case 1: (explicit user action) > >> >> >>> > B is a slave of A > >> >> >>> > mount something on A/a , it will propagate to B/a > >> >> >>> > and than mount something on B/a > >> >> >>> > > >> >> >>> > case 2: (tucked mount) > >> >> >>> > B is a slave of A > >> >> >>> > mount something on B/a > >> >> >>> > and than mount something on A/a > >> >> >>> > > >> >> >>> > Both case 1 and case 2 lead to the same mount configuration. > >> >> >>> > > >> >> >>> > > >> >> >>> > however 'umount A/a' in case 1 should fail. > >> >> >>> > and 'umount A/a' in case 2 should pass. > >> >> >>> > > >> >> >>> > Right? in other words, umounts of 'tucked mounts' should pass(case 2). > >> >> >>> > whereas umounts of mounts on which overmounts exist should > >> >> >>> > fail.(case 1) > >> >> >>> > >> >> >>> Looking at your example. I agree that case 1 will fail today. > >> >> >> > >> >> >> And should continue to fail. right? Your semantics change will pass it. > >> >> > > >> >> > I don't see why it should continue to fail. > >> >> > > >> >> >>> However my actual expectation would be for both mount configurations > >> >> >>> to behave the same. In both cases something has been explicitly mounted > >> >> >>> on B/a and something has propagated to B/a. In both cases the mount > >> >> >>> on top is what was explicitly mounted, and the mount below is what was > >> >> >>> propagated to B/a. > >> >> >>> > >> >> >>> I don't see why the order of operations should matter. > >> >> >> > >> >> >> One of the subtle expectation is reversibility. > >> >> >> > >> >> >> Mount followed immediately by unmount has always passed and that is the > >> >> >> standard expectation always. Your proposed code will ensure that. > >> >> >> > >> >> >> However there is one other subtle expectaton. > >> >> >> > >> >> >> A mount cannot disappear if a user has explicitly mounted on top of it. > >> >> >> > >> >> >> your proposed code will not meet that expectation. > >> >> >> > >> >> >> In other words, these two expectations make it behave differently even > >> >> >> when; arguably, they feel like the same configuration. > >> >> > > >> >> > I am not seeing that. > >> >> > > >> >> > > >> >> > > >> >> >>> > >> >> >>> > maybe we need a flag to identify tucked mounts? > >> >> >>> > >> >> >>> To preserve our exact current semantics yes. > >> >> >>> > >> >> >>> The mount configurations that are delibearately constructed that I am > >> >> >>> aware of are comparatively simple. I don't think anyone has even taken > >> >> >>> advantage of the shadow/side mounts at this point. I made a reasonable > >> >> >>> effort to find out and no one was even aware they existed. Much less > >> >> >>> what they were. And certainly no one I talked to could find code that > >> >> >>> used them. > >> >> >> > >> >> >> But someday; even if its after a decade, someone ;) will > >> >> >> stumble into this semantics and wonder 'why?'. Its better to get it right > >> >> >> sooner. Sorry, I am blaming myself; for keeping some of the problems > >> >> >> open thinking no one will bump into them. > >> >> > > >> >> > Oh definitely. If we have people ready to talk it through I am happy to > >> >> > dot as many i's and cross as many t's as we productively can. > >> >> > > >> >> > I was just pointing out that I don't have any reason to expect that any > >> >> > one depends on the subtle details of the implementation today so we > >> >> > still have some wiggle room to fix them. Even if they are visible to > >> >> > user space. > >> >> > >> >> So I haven't seen a reply, and we are getting awfully close to the merge > >> >> window. Is there anything concrete we can do to ease concerns? > >> >> > >> >> Right now I am thinking my last version of the patch is the likely the > >> >> best we have time and energy to manage and it would be good to merge it > >> >> before the code bit rots. > >> > > >> > I was waiting for some other opinions on the behavior, since I > >> > continue to think that 'one should not be able to unmount mounts on > >> > which a user has explicitly mounted upon'. I am happy to be overruled, > >> > since your patch significantly improves the rest of the semantics. > >> > > >> > Viro? > >> > >> Ram Pai, just to be clear you were hoping to add the logic below to my patch? > > > > Yes. the behavior of your patch below is what I was proposing. > > > >> > >> My objections to the snippet below are: > >> > >> - It makes it hard for the CRIU folks (yet more state they have to find > >> and restore). > > > > true. unfortunately one more subtle detail to be aware off. > > A bit more than that, as it means that it requires an almost exact > playback of the sequence of mounts in all mount namespaces to > get to the point of reproducing a mount namespace. Currently dump and restore of mount namespaces is the most complicated part of CRIU. The main problem is that we don't know how a tree of mounts was created. Mounts have two types of relationships: child<->parent and shared groups. Currently both this relationships can't be restored directly. We can not add a mount into an existing group, the group can be only inherited from a source mount. And we can not restore "tucked" mounts, which can be only appeared due to propagation. Now we don't know an algorithm to dump and restore any set of mount points for a reqsonable time. The problem becomes more complex if we start thinking how to restore mount namespaces which lives in different user namespaces. This patch from Eric together with my patch https://lkml.org/lkml/2017/1/23/712 can solve the problem of dumping and restoring mount namespaces. > > >> - It feels subjectively worse to me. > >> > >> - We already have cases where mounts are unmounted transparently (umount on rmdir). > > > > sorry. i am not aware of this case. some details will help. > > The question: > > What happens when we rmdir a directory that has a mount on it in another > mount namespace? > > What happens when someone on the nfs server deletes a directory there > is a mount on? > > > It used to be that we returned -EBUSY, and refused the rmdir operation, > and we lied in the vfs about the nfs dentry being deleted to preserve > the mount. > > In recent kernels I have done the work so that we transparently unmount > the mounts and allow the rmdir to happen. An unprivileged user mounting > over say glibc and blocking the yum update of it is a pretty serious > bug. > > >> - Al Viro claims that the side/shadow mounts are ordinary mounts and > >> maintaining this extra logic that remembers if we tucked one mount > >> under another seems to make this them less ordinary. > > > > I tend to argue that they are a bit more than ordinary, for they have the > > ability to tuck. > > > >> > >> - The symmetry for unmounting exists for a tucked mount. We can unmount > >> it via propagation or we can unmount the mount above it, and then we > >> can unmount the new underlying mount. > > > > this is fine with me. > > > >> So I don't see why we don't > >> want symmetry in the other case just because we mounted on top of > >> the mount and rather than had the mount tucked under us. > > > > A tucked mount should be un-tuckable. I agree. But a non-tucked mount > > cannot pretend to be tucked and this is where I disagree. > > I have always seen the question as: Should a mount that is propagated be > unmountable via umount propagation. > > Which leads me to think that allowing the umount propagation when it > won't change the applications view of files and filesystems is a good > thing. From my perspective it also better preserves the reversability > property that is important. The mount propgated and now the unmount > propagated. > > > From a system management point of view one of the largest practical > problems with mount namespaces and mount propagation is: mounts that > propagate into another mount namespaces but don't get unmounted. > > > Which is to say not unmounting something (especially silently) and > leaving the filesystem busy when something could be unmounted is a > practical problem for people. > > > I am going to be out for a week, and I am leaving in a few minutes. > So I am going to push my patch to the my for-next branch, so there > is a reasonable chance of merging things when the merge window opens. > > If the feedback is to add the MNT_TUCKED annotations to make the patch > suitable for merging to Linus's tree I will take care of that when > I get back. > > Eric > > >
On Sun, Feb 05, 2017 at 07:25:09PM -0800, Andrei Vagin wrote: > On Sat, Feb 04, 2017 at 09:58:39AM +1300, Eric W. Biederman wrote: > > Ram Pai <linuxram@us.ibm.com> writes: > > > > > On Sat, Feb 04, 2017 at 07:26:20AM +1300, Eric W. Biederman wrote: > > >> Ram Pai <linuxram@us.ibm.com> writes: > > >> > > >> > On Fri, Feb 03, 2017 at 11:54:21PM +1300, Eric W. Biederman wrote: > > >> >> ebiederm@xmission.com (Eric W. Biederman) writes: > > >> >> > > >> >> > Ram Pai <linuxram@us.ibm.com> writes: > > >> >> > > > >> >> >> On Sat, Jan 21, 2017 at 05:15:29PM +1300, Eric W. Biederman wrote: > > >> >> >>> Ram Pai <linuxram@us.ibm.com> writes: > > >> >> >>> > > >> >> >>> >> @@ -359,12 +373,24 @@ int propagate_mount_busy(struct mount *mnt, int refcnt) > > >> >> >>> >> ....snip.... > > > > A bit more than that, as it means that it requires an almost exact > > playback of the sequence of mounts in all mount namespaces to > > get to the point of reproducing a mount namespace. > > Currently dump and restore of mount namespaces is the most complicated > part of CRIU. The main problem is that we don't know how a tree of > mounts was created. Mounts have two types of relationships: > child<->parent and shared groups. Currently both this relationships > can't be restored directly. We can not add a mount into an existing > group, the group can be only inherited from a source mount. And we can > not restore "tucked" mounts, which can be only appeared due to > propagation. > > Now we don't know an algorithm to dump and restore any set of mount > points for a reqsonable time. The problem becomes more complex if we > start thinking how to restore mount namespaces which lives in different > user namespaces. > > > This patch from Eric together with my patch https://lkml.org/lkml/2017/1/23/712 > can solve the problem of dumping and restoring mount namespaces. Lets say we exposed the tucked flag for the mount in the /proc/*/mountinfo, than there will be no need to know the history. Its just a matter or setting that flag as part of your new MS_SET_GROUP mount() call. right? Looking at the patch at https://lkml.org/lkml/2017/1/23/712, I am guessing the CRUI dumps information from mountinfo to restore it later. RP
On Mon, Feb 06, 2017 at 01:40:53PM -0800, Ram Pai wrote: > On Sun, Feb 05, 2017 at 07:25:09PM -0800, Andrei Vagin wrote: > > On Sat, Feb 04, 2017 at 09:58:39AM +1300, Eric W. Biederman wrote: > > > Ram Pai <linuxram@us.ibm.com> writes: > > > > > > > On Sat, Feb 04, 2017 at 07:26:20AM +1300, Eric W. Biederman wrote: > > > >> Ram Pai <linuxram@us.ibm.com> writes: > > > >> > > > >> > On Fri, Feb 03, 2017 at 11:54:21PM +1300, Eric W. Biederman wrote: > > > >> >> ebiederm@xmission.com (Eric W. Biederman) writes: > > > >> >> > > > >> >> > Ram Pai <linuxram@us.ibm.com> writes: > > > >> >> > > > > >> >> >> On Sat, Jan 21, 2017 at 05:15:29PM +1300, Eric W. Biederman wrote: > > > >> >> >>> Ram Pai <linuxram@us.ibm.com> writes: > > > >> >> >>> > > > >> >> >>> >> @@ -359,12 +373,24 @@ int propagate_mount_busy(struct mount *mnt, int refcnt) > > > >> >> >>> >> > ....snip.... > > > > > > A bit more than that, as it means that it requires an almost exact > > > playback of the sequence of mounts in all mount namespaces to > > > get to the point of reproducing a mount namespace. > > > > > Currently dump and restore of mount namespaces is the most complicated > > part of CRIU. The main problem is that we don't know how a tree of > > mounts was created. Mounts have two types of relationships: > > child<->parent and shared groups. Currently both this relationships > > can't be restored directly. We can not add a mount into an existing > > group, the group can be only inherited from a source mount. And we can > > not restore "tucked" mounts, which can be only appeared due to Sorry, I used a wrong term. I mean "shadow" mounts. > > propagation. > > > > Now we don't know an algorithm to dump and restore any set of mount > > points for a reqsonable time. The problem becomes more complex if we > > start thinking how to restore mount namespaces which lives in different > > user namespaces. > > > > > > This patch from Eric together with my patch https://lkml.org/lkml/2017/1/23/712 > > can solve the problem of dumping and restoring mount namespaces. > > Lets say we exposed the tucked flag for the mount in the > /proc/*/mountinfo, than there will be no need to know the history. > Its just a matter or setting that flag as part of > your new MS_SET_GROUP mount() call. right? This is right. > > Looking at the patch at https://lkml.org/lkml/2017/1/23/712, I am > guessing the CRUI dumps information from mountinfo to restore it > later. > > RP >
diff --git a/fs/namespace.c b/fs/namespace.c index 8bfad42c1ccf..8b00e0548438 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2047,8 +2047,10 @@ static int attach_recursive_mnt(struct mount *source_mnt, hlist_del_init(&child->mnt_hash); q = __lookup_mnt(&child->mnt_parent->mnt, child->mnt_mountpoint); - if (q) + if (q) { mnt_change_mountpoint(child, smp, q); + child->mnt.mnt_flags |= MNT_TUCKED; + } commit_tree(child); } put_mountpoint(smp); diff --git a/fs/pnode.c b/fs/pnode.c index 5bc7896d122a..e2a6ac68feb9 100644 --- a/fs/pnode.c +++ b/fs/pnode.c @@ -327,6 +327,9 @@ static struct mount *find_topper(struct mount *mnt) /* If there is exactly one mount covering mnt completely return it. */ struct mount *child; + if (!(mnt->mnt.mnt_flags & MNT_TUCKED)) + return NULL; + if (!list_is_singular(&mnt->mnt_mounts)) return NULL; diff --git a/include/linux/mount.h b/include/linux/mount.h index 8e0352af06b7..25ca398b19b3 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -52,6 +52,7 @@ struct mnt_namespace; #define MNT_INTERNAL 0x4000 +#define MNT_TUCKED 0x020000 #define MNT_LOCK_ATIME 0x040000 #define MNT_LOCK_NOEXEC 0x080000 #define MNT_LOCK_NOSUID 0x100000