From patchwork Fri Aug 14 04:35:24 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 7012541 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 150619F344 for ; Fri, 14 Aug 2015 07:28:43 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 01DF820788 for ; Fri, 14 Aug 2015 07:28:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E0F2F2066C for ; Fri, 14 Aug 2015 07:28:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753196AbbHNH2j (ORCPT ); Fri, 14 Aug 2015 03:28:39 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:58011 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751755AbbHNH2j (ORCPT ); Fri, 14 Aug 2015 03:28:39 -0400 Received: from in02.mta.xmission.com ([166.70.13.52]) by out03.mta.xmission.com with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1ZQ6oq-0007u7-D7; Thu, 13 Aug 2015 22:42:16 -0600 Received: from 67-3-205-173.omah.qwest.net ([67.3.205.173] helo=x220.int.ebiederm.org.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1ZQ6oo-0000SO-Bk; Thu, 13 Aug 2015 22:42:16 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Linux Containers Cc: linux-fsdevel@vger.kernel.org, Al Viro , Andy Lutomirski , "Serge E. Hallyn" , Richard Weinberger , Andrey Vagin , Jann Horn , Willy Tarreau , Omar Sandoval , Miklos Szeredi , Linus Torvalds , "J. Bruce Fields" References: <871tncuaf6.fsf@x220.int.ebiederm.org> <87mw5xq7lt.fsf@x220.int.ebiederm.org> <87a8yqou41.fsf_-_@x220.int.ebiederm.org> <874moq9oyb.fsf_-_@x220.int.ebiederm.org> <871tfkawu9.fsf_-_@x220.int.ebiederm.org> <87egjk9i61.fsf_-_@x220.int.ebiederm.org> <20150810043637.GC14139@ZenIV.linux.org.uk> <877foymrwt.fsf@x220.int.ebiederm.org> <87wpwyjxwc.fsf_-_@x220.int.ebiederm.org> Date: Thu, 13 Aug 2015 23:35:24 -0500 In-Reply-To: <87wpwyjxwc.fsf_-_@x220.int.ebiederm.org> (Eric W. Biederman's message of "Thu, 13 Aug 2015 23:29:23 -0500") Message-ID: <87si7mij1v.fsf_-_@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 X-XM-AID: U2FsdGVkX1/WcE+RVfbO+l2evnyTZqrFYOvZVzwEUgk= X-SA-Exim-Connect-IP: 67.3.205.173 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-DCC: XMission; sa05 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Linux Containers X-Spam-Relay-Country: X-Spam-Timing: total 1391 ms - load_scoreonly_sql: 0.10 (0.0%), signal_user_changed: 4.8 (0.3%), b_tie_ro: 3.4 (0.2%), parse: 1.51 (0.1%), extract_message_metadata: 14 (1.0%), get_uri_detail_list: 3.1 (0.2%), tests_pri_-1000: 6 (0.4%), tests_pri_-950: 1.11 (0.1%), tests_pri_-900: 0.94 (0.1%), tests_pri_-400: 29 (2.1%), check_bayes: 28 (2.0%), b_tokenize: 10 (0.7%), b_tok_get_all: 10 (0.7%), b_comp_prob: 2.3 (0.2%), b_tok_touch_all: 3.7 (0.3%), b_finish: 0.68 (0.0%), tests_pri_0: 1327 (95.4%), tests_pri_500: 4.5 (0.3%), rewrite_mail: 0.00 (0.0%) Subject: [PATCH review 7/8] mnt: Track when a directory escapes a bind mount X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When bind mounts are in use, and there is another path to the filesystem it is possible to rename files or directories from a path underneath the root of the bind mount to a path that is not underneath the root of the bind mount. When a directory is moved out from under the root of a bind mount path name lookups that go up the directory tree potentially allow accessing the entire dentry tree of the filesystem. This is not expected, not what is desired and winds up being a secruity problem for userspace. Augment d_move, d_exchange to call d_common_ancestor and handle_possible_mount_escapes to mark any mount points that directories escape from. A few notes on the implementation: - d_splice_alias does not need to be touched as the only case that can result in a directory escaping calls d_move. - Only directory escapes are recorded as only those are relevant to new pathname lookup. Escaped files are handled in prepend_path. - A lock either namespace_sem or mount_lock needs to be held across the duration of renames where a directory could be escaping to ensure that a mount is not added, escaped, and missed during the rename. - The mount_lock is used as it does not sleep. I have audited all of thecallers of d_move and d_exchange and in every instance it appears safe for d_move and d_exchange to start sleeping. But there is no point in adding sleeping behavior if that is unncessary. - The locking order must be mount_lock outside of rename_lock as prepend_path already takes the locks in this order. Signed-off-by: "Eric W. Biederman" --- fs/dcache.c | 33 +++++++++++++++++++++++++++++++++ fs/mount.h | 2 ++ fs/namespace.c | 32 ++++++++++++++++++++++++++++++++ include/linux/mount.h | 1 + 4 files changed, 68 insertions(+) diff --git a/fs/dcache.c b/fs/dcache.c index 1f2f51055515..7927c1fbdb93 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2704,9 +2704,23 @@ static void __d_move(struct dentry *dentry, struct dentry *target, */ void d_move(struct dentry *dentry, struct dentry *target) { + bool unlock = false; + + if (d_is_dir(dentry) && (dentry->d_parent != target->d_parent)) { + const struct dentry *ancestor; + + ancestor = d_common_ancestor(dentry, target); + read_seqlock_excl(&mount_lock); + unlock = true; + handle_possible_mount_escapee(ancestor, dentry); + } + write_seqlock(&rename_lock); __d_move(dentry, target, false); write_sequnlock(&rename_lock); + if (unlock) + read_sequnlock_excl(&mount_lock); + } EXPORT_SYMBOL(d_move); @@ -2717,6 +2731,23 @@ EXPORT_SYMBOL(d_move); */ void d_exchange(struct dentry *dentry1, struct dentry *dentry2) { + bool d1_is_dir = d_is_dir(dentry1); + bool d2_is_dir = d_is_dir(dentry2); + bool unlock = false; + + if ((d1_is_dir || d2_is_dir) && + (dentry1->d_parent != dentry2->d_parent)) { + const struct dentry *ancestor; + + ancestor = d_common_ancestor(dentry1, dentry2); + read_seqlock_excl(&mount_lock); + unlock = true; + if (d1_is_dir) + handle_possible_mount_escapee(ancestor, dentry1); + if (d2_is_dir) + handle_possible_mount_escapee(ancestor, dentry2); + } + write_seqlock(&rename_lock); WARN_ON(!dentry1->d_inode); @@ -2727,6 +2758,8 @@ void d_exchange(struct dentry *dentry1, struct dentry *dentry2) __d_move(dentry1, dentry2, true); write_sequnlock(&rename_lock); + if (unlock) + read_sequnlock_excl(&mount_lock); } /** diff --git a/fs/mount.h b/fs/mount.h index e8f22970fe59..ad91963c83ac 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -107,6 +107,8 @@ static inline void detach_mounts(struct dentry *dentry) __detach_mounts(dentry); } +extern void handle_possible_mount_escapee(const struct dentry *, struct dentry *); + static inline void get_mnt_ns(struct mnt_namespace *ns) { atomic_inc(&ns->count); diff --git a/fs/namespace.c b/fs/namespace.c index af6abf476394..ddcd0b61a448 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1657,6 +1657,38 @@ out_unlock: namespace_unlock(); } +static void mark_escaped_mounts(struct dentry *root) +{ + /* Must be called with mount_lock held */ + struct mountroot *mr; + struct mount *mnt; + + mr = lookup_mountroot(root); + if (mr) { + /* Mark each mount from which a directory is escaping. + */ + hlist_for_each_entry(mnt, &mr->r_list, mnt_mr_list) + mnt->mnt.mnt_flags |= MNT_DIR_ESCAPED; + } +} + +void handle_possible_mount_escapee(const struct dentry *ancestor, + struct dentry *escapee) +{ + struct dentry *dentry; + + for (dentry = escapee->d_parent; dentry != ancestor; + dentry = dentry->d_parent) { + + if (d_mountroot(dentry)) + mark_escaped_mounts(dentry); + + /* In case there is no common ancestor */ + if (IS_ROOT(dentry)) + break; + } +} + /* * Is the caller allowed to modify his namespace? */ diff --git a/include/linux/mount.h b/include/linux/mount.h index f822c3c11377..e58bc12b19aa 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -62,6 +62,7 @@ struct mnt_namespace; #define MNT_SYNC_UMOUNT 0x2000000 #define MNT_MARKED 0x4000000 #define MNT_UMOUNT 0x8000000 +#define MNT_DIR_ESCAPED 0x10000000 struct vfsmount { struct dentry *mnt_root; /* root of the mounted tree */