From patchwork Wed May 17 05:54:34 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 9730069 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A162E60363 for ; Wed, 17 May 2017 06:01:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 90EB62679B for ; Wed, 17 May 2017 06:01:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8243F26907; Wed, 17 May 2017 06:01:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 85AB62679B for ; Wed, 17 May 2017 06:01:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752633AbdEQGBK (ORCPT ); Wed, 17 May 2017 02:01:10 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:43184 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751987AbdEQGBI (ORCPT ); Wed, 17 May 2017 02:01:08 -0400 Received: from in01.mta.xmission.com ([166.70.13.51]) by out03.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1dAs1D-0002gp-6P; Wed, 17 May 2017 00:01:07 -0600 Received: from 97-121-81-159.omah.qwest.net ([97.121.81.159] helo=x220.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1dAs16-0008MN-Kw; Wed, 17 May 2017 00:01:06 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Andrei Vagin Cc: Al Viro , , Ram Pai References: <874m1hdkyv.fsf@xmission.com> <20170103014806.GA1555@ZenIV.linux.org.uk> <87ful07ryd.fsf@xmission.com> <20170103040052.GB1555@ZenIV.linux.org.uk> <87y3yr32ig.fsf@xmission.com> <87shoz32g8.fsf_-_@xmission.com> <87a8b6r0z5.fsf_-_@xmission.com> <20170514021504.GA19495@outlook.office365.com> <87inl4ozg2.fsf@xmission.com> <87y3tzoklh.fsf@xmission.com> <20170515182704.GA15539@outlook.office365.com> <87a86dj49b.fsf@xmission.com> <871srpj2yp.fsf_-_@xmission.com> Date: Wed, 17 May 2017 00:54:34 -0500 In-Reply-To: <871srpj2yp.fsf_-_@xmission.com> (Eric W. Biederman's message of "Mon, 15 May 2017 15:10:38 -0500") Message-ID: <87efvodo4l.fsf_-_@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 X-XM-SPF: eid=1dAs16-0008MN-Kw; ; ; mid=<87efvodo4l.fsf_-_@xmission.com>; ; ; hst=in01.mta.xmission.com; ; ; ip=97.121.81.159; ; ; frm=ebiederm@xmission.com; ; ; spf=neutral X-XM-AID: U2FsdGVkX1+vzvu7z5QGeeRb8ROSJ3MlgYXOgX3ZayM= X-SA-Exim-Connect-IP: 97.121.81.159 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [REVIEW][PATCH 1/2] mnt: In propgate_umount handle visiting mounts in any order X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP While investigating some poor umount performance I realized that in the case of overlapping mount trees where some of the mounts are locked the code has been failing to unmount all of the mounts it should have been unmounting. This failure to unmount all of the necessary mounts can be reproduced with: $ cat locked_mounts_test.sh mount -t tmpfs test-base /mnt mount --make-shared /mnt mkdir -p /mnt/b mount -t tmpfs test1 /mnt/b mount --make-shared /mnt/b mkdir -p /mnt/b/10 mount -t tmpfs test2 /mnt/b/10 mount --make-shared /mnt/b/10 mkdir -p /mnt/b/10/20 mount --rbind /mnt/b /mnt/b/10/20 unshare -Urm --propagation unchaged /bin/sh -c 'sleep 5; if [ $(grep test /proc/self/mountinfo | wc -l) -eq 1 ] ; then echo SUCCESS ; else echo FAILURE ; fi' sleep 1 umount -l /mnt/b wait %% $ unshare -Urm ./locked_mounts_test.sh This failure is corrected by removing the prepass that marks mounts that may be umounted. A first pass is added that umounts mounts if possible and if not sets mount mark if they could be unmounted if they weren't locked and adds them to a list to umount possibilities. This first pass reconsiders the mounts parent if it is on the list of umount possibilities, ensuring that information of umoutability will pass from child to mount parent. A second pass then walks through all mounts that are umounted and processes their children unmounting them or marking them for reparenting. A last pass cleans up the state on the mounts that could not be umounted and if applicable reparents them to their first parent that remained mounted. While a bit longer than the old code this code is much more robust as it allows information to flow up from the leaves and down from the trunk making the order in which mounts are encountered in the umount propgation tree irrelevant. Cc: stable@vger.kernel.org Fixes: 0c56fe31420c ("mnt: Don't propagate unmounts to locked mounts") Signed-off-by: "Eric W. Biederman" --- fs/mount.h | 2 +- fs/namespace.c | 2 +- fs/pnode.c | 144 ++++++++++++++++++++++++++++++++++----------------------- 3 files changed, 88 insertions(+), 60 deletions(-) diff --git a/fs/mount.h b/fs/mount.h index ede5a1d5cf99..de45d9e76748 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -58,7 +58,7 @@ struct mount { struct mnt_namespace *mnt_ns; /* containing namespace */ struct mountpoint *mnt_mp; /* where is it mounted */ struct hlist_node mnt_mp_list; /* list mounts with the same mountpoint */ - struct list_head mnt_reparent; /* reparent list entry */ + struct list_head mnt_umounting; /* list entry for umount propagation */ #ifdef CONFIG_FSNOTIFY struct fsnotify_mark_connector __rcu *mnt_fsnotify_marks; __u32 mnt_fsnotify_mask; diff --git a/fs/namespace.c b/fs/namespace.c index 51e49866e1fe..5e3dcbeb1de5 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -236,7 +236,7 @@ static struct mount *alloc_vfsmnt(const char *name) INIT_LIST_HEAD(&mnt->mnt_slave_list); INIT_LIST_HEAD(&mnt->mnt_slave); INIT_HLIST_NODE(&mnt->mnt_mp_list); - INIT_LIST_HEAD(&mnt->mnt_reparent); + INIT_LIST_HEAD(&mnt->mnt_umounting); init_fs_pin(&mnt->mnt_umount, drop_mountpoint); } return mnt; diff --git a/fs/pnode.c b/fs/pnode.c index 52aca0a118ff..fbaca7df2eb0 100644 --- a/fs/pnode.c +++ b/fs/pnode.c @@ -413,86 +413,95 @@ void propagate_mount_unlock(struct mount *mnt) } } -/* - * Mark all mounts that the MNT_LOCKED logic will allow to be unmounted. - */ -static void mark_umount_candidates(struct mount *mnt) +static void umount_one(struct mount *mnt, struct list_head *to_umount) { - struct mount *parent = mnt->mnt_parent; - struct mount *m; - - BUG_ON(parent == mnt); - - for (m = propagation_next(parent, parent); m; - m = propagation_next(m, parent)) { - struct mount *child = __lookup_mnt(&m->mnt, - mnt->mnt_mountpoint); - if (!child || (child->mnt.mnt_flags & MNT_UMOUNT)) - continue; - if (!IS_MNT_LOCKED(child) || IS_MNT_MARKED(m)) { - SET_MNT_MARK(child); - } - } + CLEAR_MNT_MARK(mnt); + mnt->mnt.mnt_flags |= MNT_UMOUNT; + list_del_init(&mnt->mnt_child); + list_del_init(&mnt->mnt_umounting); + list_move_tail(&mnt->mnt_list, to_umount); } /* * NOTE: unmounting 'mnt' naturally propagates to all other mounts its * parent propagates to. */ -static void __propagate_umount(struct mount *mnt, struct list_head *to_reparent) +static bool __propagate_umount(struct mount *mnt, + struct list_head *to_umount, + struct list_head *to_restore) { - struct mount *parent = mnt->mnt_parent; - struct mount *m; + bool progress = false; + struct mount *child; - BUG_ON(parent == mnt); + /* + * The state of the parent won't change if this mount is + * already unmounted or marked as without children. + */ + if (mnt->mnt.mnt_flags & (MNT_UMOUNT | MNT_MARKED)) + goto out; - for (m = propagation_next(parent, parent); m; - m = propagation_next(m, parent)) { - struct mount *topper; - struct mount *child = __lookup_mnt(&m->mnt, - mnt->mnt_mountpoint); - /* - * umount the child only if the child has no children - * and the child is marked safe to unmount. - */ - if (!child || !IS_MNT_MARKED(child)) + /* Verify topper is the only grandchild that has not been + * speculatively unmounted. + */ + list_for_each_entry(child, &mnt->mnt_mounts, mnt_child) { + if (child->mnt_mountpoint == mnt->mnt.mnt_root) continue; - CLEAR_MNT_MARK(child); + if (!list_empty(&child->mnt_umounting) && IS_MNT_MARKED(child)) + continue; + /* Found a mounted child */ + goto children; + } - /* If there is exactly one mount covering all of child - * replace child with that mount. - */ - topper = find_topper(child); - if (topper) - list_add_tail(&topper->mnt_reparent, to_reparent); + /* Mark mounts that can be unmounted if not locked */ + SET_MNT_MARK(mnt); + progress = true; - if (topper || list_empty(&child->mnt_mounts)) { - list_del_init(&child->mnt_child); - list_del_init(&child->mnt_reparent); - child->mnt.mnt_flags |= MNT_UMOUNT; - list_move_tail(&child->mnt_list, &mnt->mnt_list); + /* If a mount is without children and not locked umount it. */ + if (!IS_MNT_LOCKED(mnt)) { + umount_one(mnt, to_umount); + } else { +children: + list_move_tail(&mnt->mnt_umounting, to_restore); + } +out: + return progress; +} + +static void umount_list(struct list_head *to_umount, + struct list_head *to_restore) +{ + struct mount *mnt, *child, *tmp; + list_for_each_entry(mnt, to_umount, mnt_list) { + list_for_each_entry_safe(child, tmp, &mnt->mnt_mounts, mnt_child) { + /* topper? */ + if (child->mnt_mountpoint == mnt->mnt.mnt_root) + list_move_tail(&child->mnt_umounting, to_restore); + else + umount_one(child, to_umount); } } } -static void reparent_mounts(struct list_head *to_reparent) +static void restore_mounts(struct list_head *to_restore) { - while (!list_empty(to_reparent)) { + /* Restore mounts to a clean working state */ + while (!list_empty(to_restore)) { struct mount *mnt, *parent; struct mountpoint *mp; - mnt = list_first_entry(to_reparent, struct mount, mnt_reparent); - list_del_init(&mnt->mnt_reparent); + mnt = list_first_entry(to_restore, struct mount, mnt_umounting); + CLEAR_MNT_MARK(mnt); + list_del_init(&mnt->mnt_umounting); - /* Where should this mount be reparented to? */ + /* Should this mount be reparented? */ mp = mnt->mnt_mp; parent = mnt->mnt_parent; while (parent->mnt.mnt_flags & MNT_UMOUNT) { mp = parent->mnt_mp; parent = parent->mnt_parent; } - - mnt_change_mountpoint(parent, mp, mnt); + if (parent != mnt->mnt_parent) + mnt_change_mountpoint(parent, mp, mnt); } } @@ -506,15 +515,34 @@ static void reparent_mounts(struct list_head *to_reparent) int propagate_umount(struct list_head *list) { struct mount *mnt; - LIST_HEAD(to_reparent); + LIST_HEAD(to_restore); + LIST_HEAD(to_umount); - list_for_each_entry_reverse(mnt, list, mnt_list) - mark_umount_candidates(mnt); + list_for_each_entry(mnt, list, mnt_list) { + struct mount *parent = mnt->mnt_parent; + struct mount *m; - list_for_each_entry(mnt, list, mnt_list) - __propagate_umount(mnt, &to_reparent); + for (m = propagation_next(parent, parent); m; + m = propagation_next(m, parent)) { + struct mount *child = __lookup_mnt(&m->mnt, + mnt->mnt_mountpoint); + if (!child) + continue; + + /* Check the child and parents while progress is made */ + while (__propagate_umount(child, + &to_umount, &to_restore)) { + /* Is the parent a umount candidate? */ + child = child->mnt_parent; + if (list_empty(&child->mnt_umounting)) + break; + } + } + } - reparent_mounts(&to_reparent); + umount_list(&to_umount, &to_restore); + restore_mounts(&to_restore); + list_splice_tail(&to_umount, list); return 0; }