From patchwork Sat Jun 6 18:56:54 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 6560041 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 53000C0020 for ; Sat, 6 Jun 2015 19:02:11 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 0B2F7205FD for ; Sat, 6 Jun 2015 19:02:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B259B205FA for ; Sat, 6 Jun 2015 19:02:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752796AbbFFTCH (ORCPT ); Sat, 6 Jun 2015 15:02:07 -0400 Received: from out02.mta.xmission.com ([166.70.13.232]:59040 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752518AbbFFTCF (ORCPT ); Sat, 6 Jun 2015 15:02:05 -0400 Received: from in02.mta.xmission.com ([166.70.13.52]) by out02.mta.xmission.com with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1Z1JM2-00058P-1I; Sat, 06 Jun 2015 13:02:02 -0600 Received: from 67-3-205-90.omah.qwest.net ([67.3.205.90] helo=x220.int.ebiederm.org.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1Z1JM0-00067p-Dl; Sat, 06 Jun 2015 13:02:01 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Richard Weinberger Cc: Serge Hallyn , Andy Lutomirski , Seth Forshee , Linux API , Linux Containers , Greg Kroah-Hartman , Kenton Varda , Michael Kerrisk-manpages , Linux FS Devel , Tejun Heo , "libvir-list\@redhat.com" , "Daniel P. Berrange" , Cedric Bosdonnat References: <87pp63jcca.fsf@x220.int.ebiederm.org> <87siaxuvik.fsf@x220.int.ebiederm.org> <87wq004im1.fsf@x220.int.ebiederm.org> <20150528140839.GD28842@ubuntumail> <55676E32.3050006@nod.at> <87382gh3uo.fsf@x220.int.ebiederm.org> <55677AEF.1090809@nod.at> <87iobcfkwx.fsf@x220.int.ebiederm.org> <556831CF.9040600@nod.at> Date: Sat, 06 Jun 2015 13:56:54 -0500 In-Reply-To: <556831CF.9040600@nod.at> (Richard Weinberger's message of "Fri, 29 May 2015 11:30:55 +0200") Message-ID: <87mw0c1x8p.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 X-XM-AID: U2FsdGVkX18rP734bKVMsPymJbHu5/MngUqF0fQgE7M= X-SA-Exim-Connect-IP: 67.3.205.90 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-DCC: XMission; sa04 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ***;Richard Weinberger X-Spam-Relay-Country: X-Spam-Timing: total 973 ms - load_scoreonly_sql: 0.06 (0.0%), signal_user_changed: 4.7 (0.5%), b_tie_ro: 3.4 (0.4%), parse: 1.44 (0.1%), extract_message_metadata: 21 (2.2%), get_uri_detail_list: 8 (0.8%), tests_pri_-1000: 6 (0.6%), tests_pri_-950: 1.16 (0.1%), tests_pri_-900: 0.97 (0.1%), tests_pri_-400: 35 (3.6%), check_bayes: 34 (3.5%), b_tokenize: 13 (1.4%), b_tok_get_all: 10 (1.0%), b_comp_prob: 3.0 (0.3%), b_tok_touch_all: 4.5 (0.5%), b_finish: 0.92 (0.1%), tests_pri_0: 895 (91.9%), tests_pri_500: 3.7 (0.4%), rewrite_mail: 0.00 (0.0%) Subject: Re: [CFT][PATCH 00/10] Making new mounts of proc and sysfs as safe as bind mounts (take 2) X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Richard Weinberger writes: > [CC'ing libvirt-lxc folks] > > Am 28.05.2015 um 23:32 schrieb Eric W. Biederman: >> Richard Weinberger writes: >> >>> Am 28.05.2015 um 21:57 schrieb Eric W. Biederman: >>>>> FWIW, it breaks also libvirt-lxc: >>>>> Error: internal error: guest failed to start: Failed to re-mount /proc/sys on /proc/sys flags=1021: Operation not permitted >>>> >>>> Interesting. I had not anticipated a failure there? And it is failing >>>> in remount? Oh that is interesting. >>>> >>>> That implies that there is some flag of the original mount of /proc that >>>> the remount of /proc/sys is clearing, and that previously >>>> >>>> The flags specified are current rdonly,remount,bind so I expect there >>>> are some other flags on proc that libvirt-lxc is clearing by accident >>>> and we did not fail before because the kernel was not enforcing things. >>> >>> Please see: >>> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9ae5c2aaf0f90ff472f24fda43c077b44998c7;hb=HEAD#l933 >>> lxcContainerMountBasicFS() >>> >>> and: >>> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9ae5c2aaf0f90ff472f24fda43c077b44998c7;hb=HEAD#l850 >>> lxcBasicMounts >>> >>>> What are the mount flags in a working libvirt-lxc? >>> >>> See: >>> test1:~ # cat /proc/self/mountinfo >>> 149 147 0:56 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw >>> 150 149 0:56 /sys /proc/sys ro,nodev,relatime - proc proc rw >> >>> If you need more info, please let me know. :-) >> >> Oh interesting I had not realized libvirt-lxc had grown an unprivileged >> mode using user namespaces. >> >> This does appear to be a classic remount bug, where you are not >> preserving the permissions. It appears the fact that the code >> failed to enforce locked permissions on the fresh mount of proc >> was hiding this bug until now. >> >> I expect what you actually want is the code below: >> >> diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c >> index 9a9ae5c2aaf0..f008a7484bfe 100644 >> --- a/src/lxc/lxc_container.c >> +++ b/src/lxc/lxc_container.c >> @@ -850,7 +850,7 @@ typedef struct { >> >> static const virLXCBasicMountInfo lxcBasicMounts[] = { >> { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, >> - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, >> + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, >> { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, >> { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, >> { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, >> >> Or possibly just: >> >> diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c >> index 9a9ae5c2aaf0..a60ccbd12bfc 100644 >> --- a/src/lxc/lxc_container.c >> +++ b/src/lxc/lxc_container.c >> @@ -850,7 +850,7 @@ typedef struct { >> >> static const virLXCBasicMountInfo lxcBasicMounts[] = { >> { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, >> - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, >> + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true, false, false }, >> { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, >> { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, >> { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, >> >> As the there is little point in making /proc/sys read-only in a >> user-namespace, as the permission checks are uid based and no-one should >> have the global uid 0 in your container. Making mounting /proc/sys >> read-only rather pointless. > > Eric, using the patch below I was able to spawn a user-namespace enabled container > using libvirt-lxc. :-) > > I had to: > 1. Disable the read-only mount of /proc/sys which is anyway useless in the user-namespace case. > 2. Disable the /proc/sys/net/ipv{4,6} bind mounts, this ugly hack is only needed for the non user-namespace case. > 3. Remove MS_RDONLY from the sysfs mount (For the non user-namespace case we'd have to keep this, though). > > Daniel, I'd take this as a chance to disable all the MS_RDONLY games if user-namespace are configured. > With Eric's fixes they hurt us. And as I wrote many times before if root within the user-namespace > is able to do nasty things in /sys and /proc that's a plain kernel bug which needs fixing. There is no > point in mounting these read-only. Except for the case then no user-namespace is used. > For clarity the patch below appears to be the minimal change needed to fix this security issue. AKA add mnt_mflags in when remounting something read-only. /proc/sys needed to be updated so it had the proper flags to be added back in. I hope this helps. Eric --- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 9a9ae5c2aaf0..11e9514e0761 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -850,7 +850,7 @@ typedef struct { static const virLXCBasicMountInfo lxcBasicMounts[] = { { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, @@ -1030,7 +1030,7 @@ static int lxcContainerMountBasicFS(bool userns_enabled, if (bindOverReadonly && mount(mnt_src, mnt->dst, NULL, - MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) { + MS_BIND|MS_REMOUNT|mnt_mflags|MS_RDONLY, NULL) < 0) { virReportSystemError(errno, _("Failed to re-mount %s on %s flags=%x"), mnt_src, mnt->dst,