Message ID | 20160721164014.17534-2-ebiederm@xmission.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
"Eric W. Biederman" <ebiederm@xmission.com> writes: > Limit per userns sysctls to only be opened for write by a holder > of CAP_SYS_RESOURCE. > > Add all of the necessary boilerplate for having per user namespace > sysctls. > @@ -141,6 +215,7 @@ void free_user_ns(struct user_namespace *ns) > > do { > parent = ns->parent; > + retire_userns_sysctls(ns); ^^^^^^^^^^ Unfortunately it is not safe to call a sleeping function here so this part needs to be taken back to the drawing board. Which means this change gets has to wait for next cycle. > #ifdef CONFIG_PERSISTENT_KEYRINGS > key_put(ns->persistent_keyring_register); > #endif Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: ebiederm@xmission.com (Eric W. Biederman) Date: Mon, 25 Jul 2016 19:02:01 -0500 > Which means this change gets has to wait for next cycle. Ok. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller <davem@davemloft.net> writes: > From: ebiederm@xmission.com (Eric W. Biederman) > Date: Mon, 25 Jul 2016 19:02:01 -0500 > >> Which means this change gets has to wait for next cycle. > > Ok. For clarity I intend to merge these changes through the userns tree, when the issues are resolved. I Cc'd netdev as there is a limit on the number of network namespaces in this set which may be of interest to networking folks. I expect there will be some follow on about adding sanity checking limits to other kernel data structures like a maximum number of mounts in a mount namespace, and perhaps a maximum number of routes in a network namespace. User namespaces have enabled unprivileged users access to a lot more data structures and so to catch programs that go crazy we need a lot more limits. I believe some of those limits make sense per namespace. As it is easy in some cases to say any more than Y number of those per namespace is excessive. For example a limit of 1,000,000 ipv4 routes per network namespaces is a sanity check as there are currently 621,649 ipv4 prefixes advertized in bgp. But that is something to worry about after the merge window. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: ebiederm@xmission.com (Eric W. Biederman) Date: Mon, 25 Jul 2016 19:44:50 -0500 > User namespaces have enabled unprivileged users access to a lot more > data structures and so to catch programs that go crazy we need a lot > more limits. I believe some of those limits make sense per namespace. > As it is easy in some cases to say any more than Y number of those > per namespace is excessive. For example a limit of 1,000,000 ipv4 > routes per network namespaces is a sanity check as there are > currently 621,649 ipv4 prefixes advertized in bgp. When we give a new namespace to unprivileged users, we honestly should make the sysctl settings we give to them become "limits". They can further constrain the sysctl settings but may not raise them. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller <davem@davemloft.net> writes: > From: ebiederm@xmission.com (Eric W. Biederman) > Date: Mon, 25 Jul 2016 19:44:50 -0500 > >> User namespaces have enabled unprivileged users access to a lot more >> data structures and so to catch programs that go crazy we need a lot >> more limits. I believe some of those limits make sense per namespace. >> As it is easy in some cases to say any more than Y number of those >> per namespace is excessive. For example a limit of 1,000,000 ipv4 >> routes per network namespaces is a sanity check as there are >> currently 621,649 ipv4 prefixes advertized in bgp. > > When we give a new namespace to unprivileged users, we honestly should > make the sysctl settings we give to them become "limits". They can > further constrain the sysctl settings but may not raise them. I won't disagree. I was thinking in terms of global setting that hold the limits for per namespace counters. As we are talking sanity check limits. Perhaps we could get sophisticated and do something more but the simpler we can make things and get the job done the better. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 9217169c64cb..7d59af1f08f1 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -39,6 +39,10 @@ struct user_namespace { struct key *persistent_keyring_register; struct rw_semaphore persistent_keyring_register_sem; #endif +#ifdef CONFIG_SYSCTL + struct ctl_table_set set; + struct ctl_table_header *sysctls; +#endif }; extern struct user_namespace init_user_ns; diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 68f594212759..10afbb55dfc2 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -30,6 +30,70 @@ static bool new_idmap_permitted(const struct file *file, struct user_namespace *ns, int cap_setid, struct uid_gid_map *map); +#ifdef CONFIG_SYSCTL +static struct ctl_table_set * +set_lookup(struct ctl_table_root *root) +{ + return ¤t_user_ns()->set; +} + +static int set_is_seen(struct ctl_table_set *set) +{ + return ¤t_user_ns()->set == set; +} + +static int set_permissions(struct ctl_table_header *head, + struct ctl_table *table) +{ + struct user_namespace *user_ns = + container_of(head->set, struct user_namespace, set); + int mode; + + /* Allow users with CAP_SYS_RESOURCE unrestrained access */ + if (ns_capable(user_ns, CAP_SYS_RESOURCE)) + mode = (table->mode & S_IRWXU) >> 6; + else + /* Allow all others at most read-only access */ + mode = table->mode & S_IROTH; + return (mode << 6) | (mode << 3) | mode; +} + +static struct ctl_table_root set_root = { + .lookup = set_lookup, + .permissions = set_permissions, +}; + +static struct ctl_table userns_table[] = { + { } +}; +#endif /* CONFIG_SYSCTL */ + +static bool setup_userns_sysctls(struct user_namespace *ns) +{ +#ifdef CONFIG_SYSCTL + struct ctl_table *tbl; + setup_sysctl_set(&ns->set, &set_root, set_is_seen); + tbl = kmemdup(userns_table, sizeof(userns_table), GFP_KERNEL); + if (tbl) { + ns->sysctls = __register_sysctl_table(&ns->set, "userns", tbl); + } + if (!ns->sysctls) { + kfree(tbl); + retire_sysctl_set(&ns->set); + return false; + } +#endif + return true; +} + +static void retire_userns_sysctls(struct user_namespace *ns) +{ +#ifdef CONFIG_SYSCTL + unregister_sysctl_table(ns->sysctls); + retire_sysctl_set(&ns->set); +#endif +} + static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns) { /* Start with the same capabilities as init but useless for doing @@ -107,12 +171,22 @@ int create_user_ns(struct cred *new) ns->flags = parent_ns->flags; mutex_unlock(&userns_state_mutex); - set_cred_user_ns(new, ns); - #ifdef CONFIG_PERSISTENT_KEYRINGS init_rwsem(&ns->persistent_keyring_register_sem); #endif + ret = -ENOMEM; + if (!setup_userns_sysctls(ns)) + goto fail_keyring; + + set_cred_user_ns(new, ns); return 0; +fail_keyring: +#ifdef CONFIG_PERSISTENT_KEYRINGS + key_put(ns->persistent_keyring_register); +#endif + ns_free_inum(&ns->ns); + kmem_cache_free(user_ns_cachep, ns); + return ret; } int unshare_userns(unsigned long unshare_flags, struct cred **new_cred) @@ -141,6 +215,7 @@ void free_user_ns(struct user_namespace *ns) do { parent = ns->parent; + retire_userns_sysctls(ns); #ifdef CONFIG_PERSISTENT_KEYRINGS key_put(ns->persistent_keyring_register); #endif @@ -1012,9 +1087,26 @@ const struct proc_ns_operations userns_operations = { .install = userns_install, }; +static __init void user_namespace_sysctl_init(void) +{ +#ifdef CONFIG_SYSCTL + static struct ctl_table_header *userns_header; + static struct ctl_table empty[1]; + /* + * It is necessary to register the userns directory in the + * default set so that registrations in the child sets work + * properly. + */ + userns_header = register_sysctl("userns", empty); + BUG_ON(!userns_header); + BUG_ON(!setup_userns_sysctls(&init_user_ns)); +#endif +} + static __init int user_namespaces_init(void) { user_ns_cachep = KMEM_CACHE(user_namespace, SLAB_PANIC); + user_namespace_sysctl_init(); return 0; } subsys_initcall(user_namespaces_init);
Limit per userns sysctls to only be opened for write by a holder of CAP_SYS_RESOURCE. Add all of the necessary boilerplate for having per user namespace sysctls. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/user_namespace.h | 4 ++ kernel/user_namespace.c | 96 +++++++++++++++++++++++++++++++++++++++++- 2 files changed, 98 insertions(+), 2 deletions(-)