From patchwork Thu Oct 13 17:14:55 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 9375517 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6AEE86075E for ; Thu, 13 Oct 2016 17:18:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 516CF2A178 for ; Thu, 13 Oct 2016 17:18:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 43EBB2A17C; Thu, 13 Oct 2016 17:18:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 118902A178 for ; Thu, 13 Oct 2016 17:18:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933979AbcJMRSV (ORCPT ); Thu, 13 Oct 2016 13:18:21 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:59063 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933766AbcJMRSU (ORCPT ); Thu, 13 Oct 2016 13:18:20 -0400 Received: from in02.mta.xmission.com ([166.70.13.52]) by out03.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1bujdH-0000hV-J4; Thu, 13 Oct 2016 11:17:27 -0600 Received: from 75-170-125-99.omah.qwest.net ([75.170.125.99] helo=x220.int.ebiederm.org.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1bujcw-0000oT-ND; Thu, 13 Oct 2016 11:17:27 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Andrei Vagin Cc: Alexander Viro , containers@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org References: <1476141965-21429-1-git-send-email-avagin@openvz.org> Date: Thu, 13 Oct 2016 12:14:55 -0500 In-Reply-To: <1476141965-21429-1-git-send-email-avagin@openvz.org> (Andrei Vagin's message of "Mon, 10 Oct 2016 16:26:05 -0700") Message-ID: <877f9c6ui8.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux) MIME-Version: 1.0 X-XM-SPF: eid=1bujcw-0000oT-ND; ; ; mid=<877f9c6ui8.fsf@x220.int.ebiederm.org>; ; ; hst=in02.mta.xmission.com; ; ; ip=75.170.125.99; ; ; frm=ebiederm@xmission.com; ; ; spf=neutral X-XM-AID: U2FsdGVkX19pPgPU8xyruCYsD9C52zNsRCm9nJYQ7w0= X-SA-Exim-Connect-IP: 75.170.125.99 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH] [v3] mount: dont execute propagate_umount() many times for same mounts X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Andrei Vagin writes: > The reason of this optimization is that umount() can hold namespace_sem > for a long time, this semaphore is global, so it affects all users. > Recently Eric W. Biederman added a per mount namespace limit on the > number of mounts. The default number of mounts allowed per mount > namespace at 100,000. Currently this value is allowed to construct a tree > which requires hours to be umounted. > > In a worse case the current complexity of umount_tree() is O(n^3). > * Enumirate all mounts in a target tree (propagate_umount) > * Enumirate mounts to find where these changes have to > be propagated (mark_umount_candidates) > * Enumirate mounts to find a requered mount by parent and dentry > (__lookup_mnt_lat) > > The worse case is when all mounts from the tree live in the same shared > group. In this case we have to enumirate all mounts on each step. > > Here we can optimize the second step. We don't need to make it for > mounts which we already met when we did this step for previous mounts. > It reduces the complexity of umount_tree() to O(n^2). To O(n) not O(n^2). A hash table lookup (aka __lookup_mnt() and friends) is O(1) or the hash table is malfunctioning. Please don't call Arguably we are getting into sizes where the mount hash table fills up and is on the edge of malfunctioning, but that is not particularly relevant to this case. What your patch is aiming to do is to take a O(n^2) algorithm and make it O(n). That is very much worth doing. However your patch confuses two separate issues. Marking mounts that may be unmounted. Marking pieces of the propagation tree that have already been traversed. I do not see anything requiring propagation trees to intersect at the set of mounts that are unmounted in umount_tree before propagate_umount is called. Which means there are topologies where we can and should do better than your patch. I am also bothered that your patch changes how we look up the mount mounted on a mount point (aka playing with __lookup_mnt_last). There is no reason to do that to solve the problem, and I think it obscures what is actually going on. I am going to see if I can rework your basic concept with explicit marking of the propagation tree. In the meantime for people who are want to see what your patch is doing the version below essentially does the same thing, without the extra essentially meaningless loop. Eric --- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/namespace.c b/fs/namespace.c index 8183fba9ab4d..33a76ee1b76b 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -650,13 +650,11 @@ struct mount *__lookup_mnt_last(struct vfsmount *mnt, struct dentry *dentry) p = __lookup_mnt(mnt, dentry); if (!p) goto out; - if (!(p->mnt.mnt_flags & MNT_UMOUNT)) - res = p; + res = p; hlist_for_each_entry_continue(p, mnt_hash) { if (&p->mnt_parent->mnt != mnt || p->mnt_mountpoint != dentry) break; - if (!(p->mnt.mnt_flags & MNT_UMOUNT)) - res = p; + res = p; } out: return res; diff --git a/fs/pnode.c b/fs/pnode.c index 234a9ac49958..ade5e7d8308c 100644 --- a/fs/pnode.c +++ b/fs/pnode.c @@ -399,11 +399,18 @@ static void mark_umount_candidates(struct mount *mnt) BUG_ON(parent == mnt); + if (IS_MNT_MARKED(mnt)) + return; + + SET_MNT_MARK(mnt); + for (m = propagation_next(parent, parent); m; m = propagation_next(m, parent)) { struct mount *child = __lookup_mnt_last(&m->mnt, mnt->mnt_mountpoint); - if (child && (!IS_MNT_LOCKED(child) || IS_MNT_MARKED(m))) { + if (child && ((child->mnt.mnt_flags & MNT_UMOUNT) || + !IS_MNT_LOCKED(child) || + IS_MNT_MARKED(m))) { SET_MNT_MARK(child); } } @@ -420,6 +427,11 @@ static void __propagate_umount(struct mount *mnt) BUG_ON(parent == mnt); + if (!IS_MNT_MARKED(mnt)) + return; + + CLEAR_MNT_MARK(mnt); + for (m = propagation_next(parent, parent); m; m = propagation_next(m, parent)) { @@ -432,7 +444,8 @@ static void __propagate_umount(struct mount *mnt) if (!child || !IS_MNT_MARKED(child)) continue; CLEAR_MNT_MARK(child); - if (list_empty(&child->mnt_mounts)) { + if (!(child->mnt.mnt_flags & MNT_UMOUNT) && + list_empty(&child->mnt_mounts)) { list_del_init(&child->mnt_child); child->mnt.mnt_flags |= MNT_UMOUNT; list_move_tail(&child->mnt_list, &mnt->mnt_list);