From patchwork Sat Aug 15 18:37:13 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 7021301 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 6E4CAC05AC for ; Sat, 15 Aug 2015 18:44:12 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 405D3206A3 for ; Sat, 15 Aug 2015 18:44:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DF3852069F for ; Sat, 15 Aug 2015 18:44:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754181AbbHOSoI (ORCPT ); Sat, 15 Aug 2015 14:44:08 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:48477 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754089AbbHOSoH (ORCPT ); Sat, 15 Aug 2015 14:44:07 -0400 Received: from in02.mta.xmission.com ([166.70.13.52]) by out03.mta.xmission.com with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1ZQgR3-0006GR-Gy; Sat, 15 Aug 2015 12:44:05 -0600 Received: from 67-3-205-173.omah.qwest.net ([67.3.205.173] helo=x220.int.ebiederm.org.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1ZQgR2-0002wh-8k; Sat, 15 Aug 2015 12:44:05 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Linux Containers Cc: linux-fsdevel@vger.kernel.org, Al Viro , Andy Lutomirski , "Serge E. Hallyn" , Richard Weinberger , Andrey Vagin , Jann Horn , Willy Tarreau , Omar Sandoval , Miklos Szeredi , Linus Torvalds , "J. Bruce Fields" References: <871tncuaf6.fsf@x220.int.ebiederm.org> <87mw5xq7lt.fsf@x220.int.ebiederm.org> <87a8yqou41.fsf_-_@x220.int.ebiederm.org> <874moq9oyb.fsf_-_@x220.int.ebiederm.org> <871tfkawu9.fsf_-_@x220.int.ebiederm.org> <87egjk9i61.fsf_-_@x220.int.ebiederm.org> <20150810043637.GC14139@ZenIV.linux.org.uk> <877foymrwt.fsf@x220.int.ebiederm.org> <87wpwyjxwc.fsf_-_@x220.int.ebiederm.org> <87fv3mjxsc.fsf_-_@x220.int.ebiederm.org> <20150815061617.GG14139@ZenIV.linux.org.uk> <874mk08l3g.fsf@x220.int.ebiederm.org> <87a8ts763c.fsf_-_@x220.int.ebiederm.org> Date: Sat, 15 Aug 2015 13:37:13 -0500 In-Reply-To: <87a8ts763c.fsf_-_@x220.int.ebiederm.org> (Eric W. Biederman's message of "Sat, 15 Aug 2015 13:35:19 -0500") Message-ID: <87si7k5rfq.fsf_-_@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 X-XM-AID: U2FsdGVkX18z8ZO4jo7kvY6nFEwrJf0jpO4CiIcpAFA= X-SA-Exim-Connect-IP: 67.3.205.173 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-DCC: XMission; sa05 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Linux Containers X-Spam-Relay-Country: X-Spam-Timing: total 537 ms - load_scoreonly_sql: 0.05 (0.0%), signal_user_changed: 5 (0.9%), b_tie_ro: 3.6 (0.7%), parse: 1.84 (0.3%), extract_message_metadata: 19 (3.5%), get_uri_detail_list: 8 (1.4%), tests_pri_-1000: 6 (1.1%), tests_pri_-950: 1.19 (0.2%), tests_pri_-900: 0.97 (0.2%), tests_pri_-400: 39 (7.3%), check_bayes: 38 (7.1%), b_tokenize: 14 (2.5%), b_tok_get_all: 13 (2.4%), b_comp_prob: 3.3 (0.6%), b_tok_touch_all: 5 (1.0%), b_finish: 0.97 (0.2%), tests_pri_0: 454 (84.6%), tests_pri_500: 6 (1.0%), rewrite_mail: 0.00 (0.0%) Subject: [PATCH review 3/7] mnt: Track which mounts use a dentry as root. X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is needed infrastructure for better handling of when files or directories are moved out from under the root of a bind mount. Signed-off-by: "Eric W. Biederman" --- fs/mount.h | 7 +++ fs/namespace.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++-- include/linux/dcache.h | 7 +++ 3 files changed, 130 insertions(+), 4 deletions(-) diff --git a/fs/mount.h b/fs/mount.h index 14db05d424f7..e8f22970fe59 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -27,6 +27,12 @@ struct mountpoint { int m_count; }; +struct mountroot { + struct hlist_node r_hash; + struct dentry *r_dentry; + struct hlist_head r_list; +}; + struct mount { struct hlist_node mnt_hash; struct mount *mnt_parent; @@ -55,6 +61,7 @@ struct mount { struct mnt_namespace *mnt_ns; /* containing namespace */ struct mountpoint *mnt_mp; /* where is it mounted */ struct hlist_node mnt_mp_list; /* list mounts with the same mountpoint */ + struct hlist_node mnt_mr_list; /* list mounts with the same mountroot */ #ifdef CONFIG_FSNOTIFY struct hlist_head mnt_fsnotify_marks; __u32 mnt_fsnotify_mask; diff --git a/fs/namespace.c b/fs/namespace.c index c7cb8a526c05..af6abf476394 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -31,6 +31,8 @@ static unsigned int m_hash_mask __read_mostly; static unsigned int m_hash_shift __read_mostly; static unsigned int mp_hash_mask __read_mostly; static unsigned int mp_hash_shift __read_mostly; +static unsigned int mr_hash_mask __read_mostly; +static unsigned int mr_hash_shift __read_mostly; static __initdata unsigned long mhash_entries; static int __init set_mhash_entries(char *str) @@ -52,6 +54,16 @@ static int __init set_mphash_entries(char *str) } __setup("mphash_entries=", set_mphash_entries); +static __initdata unsigned long mrhash_entries; +static int __init set_mrhash_entries(char *str) +{ + if (!str) + return 0; + mrhash_entries = simple_strtoul(str, &str, 0); + return 1; +} +__setup("mrhash_entries=", set_mrhash_entries); + static u64 event; static DEFINE_IDA(mnt_id_ida); static DEFINE_IDA(mnt_group_ida); @@ -61,6 +73,7 @@ static int mnt_group_start = 1; static struct hlist_head *mount_hashtable __read_mostly; static struct hlist_head *mountpoint_hashtable __read_mostly; +static struct hlist_head *mountroot_hashtable __read_mostly; static struct kmem_cache *mnt_cache __read_mostly; static DECLARE_RWSEM(namespace_sem); @@ -93,6 +106,13 @@ static inline struct hlist_head *mp_hash(struct dentry *dentry) return &mountpoint_hashtable[tmp & mp_hash_mask]; } +static inline struct hlist_head *mr_hash(struct dentry *dentry) +{ + unsigned long tmp = ((unsigned long)dentry / L1_CACHE_BYTES); + tmp = tmp + (tmp >> mr_hash_shift); + return &mountroot_hashtable[tmp & mr_hash_mask]; +} + /* * allocation is serialized by namespace_sem, but we need the spinlock to * serialize with freeing. @@ -234,6 +254,7 @@ static struct mount *alloc_vfsmnt(const char *name) INIT_LIST_HEAD(&mnt->mnt_slave_list); INIT_LIST_HEAD(&mnt->mnt_slave); INIT_HLIST_NODE(&mnt->mnt_mp_list); + INIT_HLIST_NODE(&mnt->mnt_mr_list); #ifdef CONFIG_FSNOTIFY INIT_HLIST_HEAD(&mnt->mnt_fsnotify_marks); #endif @@ -779,6 +800,77 @@ static void put_mountpoint(struct mountpoint *mp) } } +static struct mountroot *lookup_mountroot(struct dentry *dentry) +{ + struct hlist_head *chain = mr_hash(dentry); + struct mountroot *mr; + + hlist_for_each_entry(mr, chain, r_hash) { + if (mr->r_dentry == dentry) + return mr; + } + return NULL; +} + +static int mnt_set_root(struct mount *mnt, struct dentry *root) +{ + struct mountroot *mr = NULL; + + read_seqlock_excl(&mount_lock); + if (d_mountroot(root)) + mr = lookup_mountroot(root); + if (!mr) { + struct mountroot *new; + read_sequnlock_excl(&mount_lock); + + new = kmalloc(sizeof(struct mountroot), GFP_KERNEL); + if (!new) + return -ENOMEM; + + read_seqlock_excl(&mount_lock); + mr = lookup_mountroot(root); + if (mr) { + kfree(new); + } else { + struct hlist_head *chain = mr_hash(root); + + mr = new; + mr->r_dentry = root; + INIT_HLIST_HEAD(&mr->r_list); + hlist_add_head(&mr->r_hash, chain); + + spin_lock(&root->d_lock); + root->d_flags |= DCACHE_MOUNTROOT; + spin_unlock(&root->d_lock); + } + } + mnt->mnt.mnt_root = root; + hlist_add_head(&mnt->mnt_mr_list, &mr->r_list); + read_sequnlock_excl(&mount_lock); + + return 0; +} + +static void mnt_put_root(struct mount *mnt) +{ + struct dentry *root = mnt->mnt.mnt_root; + struct mountroot *mr; + + read_seqlock_excl(&mount_lock); + mr = lookup_mountroot(root); + BUG_ON(!mr); + hlist_del(&mnt->mnt_mr_list); + if (hlist_empty(&mr->r_list)) { + hlist_del(&mr->r_hash); + spin_lock(&root->d_lock); + root->d_flags &= ~DCACHE_MOUNTROOT; + spin_unlock(&root->d_lock); + kfree(mr); + } + read_sequnlock_excl(&mount_lock); + dput(root); +} + static inline int check_mnt(struct mount *mnt) { return mnt->mnt_ns == current->nsproxy->mnt_ns; @@ -934,6 +1026,7 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void { struct mount *mnt; struct dentry *root; + int err; if (!type) return ERR_PTR(-ENODEV); @@ -952,8 +1045,16 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void return ERR_CAST(root); } - mnt->mnt.mnt_root = root; mnt->mnt.mnt_sb = root->d_sb; + err = mnt_set_root(mnt, root); + if (err) { + dput(root); + deactivate_super(mnt->mnt.mnt_sb); + mnt_free_id(mnt); + free_vfsmnt(mnt); + return ERR_PTR(err); + } + mnt->mnt_mountpoint = mnt->mnt.mnt_root; mnt->mnt_parent = mnt; lock_mount_hash(); @@ -985,6 +1086,10 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root, goto out_free; } + err = mnt_set_root(mnt, root); + if (err) + goto out_free; + mnt->mnt.mnt_flags = old->mnt.mnt_flags & ~(MNT_WRITE_HOLD|MNT_MARKED); /* Don't allow unprivileged users to change mount flags */ if (flag & CL_UNPRIVILEGED) { @@ -1010,7 +1115,7 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root, atomic_inc(&sb->s_active); mnt->mnt.mnt_sb = sb; - mnt->mnt.mnt_root = dget(root); + dget(root); mnt->mnt_mountpoint = mnt->mnt.mnt_root; mnt->mnt_parent = mnt; lock_mount_hash(); @@ -1063,7 +1168,7 @@ static void cleanup_mnt(struct mount *mnt) if (unlikely(mnt->mnt_pins.first)) mnt_pin_kill(mnt); fsnotify_vfsmount_delete(&mnt->mnt); - dput(mnt->mnt.mnt_root); + mnt_put_root(mnt); deactivate_super(mnt->mnt.mnt_sb); mnt_free_id(mnt); call_rcu(&mnt->mnt_rcu, delayed_free_vfsmnt); @@ -3096,14 +3201,21 @@ void __init mnt_init(void) mphash_entries, 19, 0, &mp_hash_shift, &mp_hash_mask, 0, 0); + mountroot_hashtable = alloc_large_system_hash("Mountroot-cache", + sizeof(struct hlist_head), + mrhash_entries, 19, + 0, + &mr_hash_shift, &mr_hash_mask, 0, 0); - if (!mount_hashtable || !mountpoint_hashtable) + if (!mount_hashtable || !mountpoint_hashtable || !mountroot_hashtable) panic("Failed to allocate mount hash table\n"); for (u = 0; u <= m_hash_mask; u++) INIT_HLIST_HEAD(&mount_hashtable[u]); for (u = 0; u <= mp_hash_mask; u++) INIT_HLIST_HEAD(&mountpoint_hashtable[u]); + for (u = 0; u <= mr_hash_mask; u++) + INIT_HLIST_HEAD(&mountroot_hashtable[u]); kernfs_init(); diff --git a/include/linux/dcache.h b/include/linux/dcache.h index d2d50249b7b2..06bed2a1053c 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -228,6 +228,8 @@ struct dentry_operations { #define DCACHE_FALLTHRU 0x01000000 /* Fall through to lower layer */ #define DCACHE_OP_SELECT_INODE 0x02000000 /* Unioned entry: dcache op selects inode */ +#define DCACHE_MOUNTROOT 0x04000000 /* Root of a vfsmount */ + extern seqlock_t rename_lock; /* @@ -403,6 +405,11 @@ static inline bool d_mountpoint(const struct dentry *dentry) return dentry->d_flags & DCACHE_MOUNTED; } +static inline bool d_mountroot(const struct dentry *dentry) +{ + return dentry->d_flags & DCACHE_MOUNTROOT; +} + /* * Directory cache entry type accessor functions. */