From patchwork Mon May 20 13:32:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 10951123 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D08061390 for ; Mon, 20 May 2019 13:34:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C2BBA28879 for ; Mon, 20 May 2019 13:34:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B5FB2287D3; Mon, 20 May 2019 13:34:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DF71228893 for ; Mon, 20 May 2019 13:34:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732284AbfETNeE (ORCPT ); Mon, 20 May 2019 09:34:04 -0400 Received: from mx2.mailbox.org ([80.241.60.215]:61390 "EHLO mx2.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730481AbfETNeD (ORCPT ); Mon, 20 May 2019 09:34:03 -0400 Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx2.mailbox.org (Postfix) with ESMTPS id 1CDAEA1039; Mon, 20 May 2019 15:33:59 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter06.heinlein-hosting.de (spamfilter06.heinlein-hosting.de [80.241.56.125]) (amavisd-new, port 10030) with ESMTP id oWht5QRrR2YG; Mon, 20 May 2019 15:33:55 +0200 (CEST) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan Cc: Aleksa Sarai , Andy Lutomirski , Christian Brauner , Eric Biederman , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Aleksa Sarai , Linus Torvalds , containers@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Subject: [PATCH RFC v8 01/10] namei: obey trailing magic-link DAC permissions Date: Mon, 20 May 2019 23:32:56 +1000 Message-Id: <20190520133305.11925-2-cyphar@cyphar.com> In-Reply-To: <20190520133305.11925-1-cyphar@cyphar.com> References: <20190520133305.11925-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The ability for userspace to "re-open" file descriptors through /proc/self/fd has been a very useful tool for all sorts of usecases (container runtimes are one common example). However, the current interface for doing this has resulted in some pretty subtle security holes. Userspace can re-open a file descriptor with more permissions than the original, which can result in cases such as /proc/$pid/exe being re-opened O_RDWR at a later date even though (by definition) /proc/$pid/exe cannot be opened for writing. When combined with O_PATH the results can get even more confusing. We cannot block this outright. Aside from userspace already depending on it, it's a useful feature which can actually increase the security of userspace. For instance, LXC keeps an O_PATH of the container's /dev/pts/ptmx that gets re-opened to create new ptys and then uses TIOCGPTPEER to get the slave end. This allows for pty allocation without resolving paths inside an (untrusted) container's rootfs. There isn't a trivial way of doing this that is as straight-forward and safe as O_PATH re-opening. Instead we have to restrict it in such a way that it doesn't break (good) users but does block potential attackers. The solution applied in this patch is to restrict *re-opening* (not resolution through) magic-links by requiring that mode of the link be obeyed. Normal symlinks have modes of a+rwx but magic-links have other modes. These magic-link modes were historically ignored during path resolution, but they've now been re-purposed for more useful ends. It is also necessary to define semantics for the mode of an O_PATH descriptor, since re-opening a magic-link through an O_PATH needs to be just as restricted as the corresponding magic-link otherwise the above protection can be bypassed. There are two distinct cases: 1. The target is a regular file (not a magic-link). Userspace depends on being able to re-open the O_PATH of a regular file, so we must define the mode to be a+rwx. 2. The target is a magic-link. In this case, we simply copy the mode of the magic-link. This results in an O_PATH of a magic-link effectively acting as a no-op in terms of how much re-opening privileges a process has. CAP_DAC_OVERRIDE can be used to override all of these restrictions, but we only permit &init_userns's capabilities to affect these semantics. The reason for this is that there isn't a clear way to track what user_ns is the original owner of a given O_PATH chain -- thus an unprivileged user could create a new userns and O_PATH the file descriptor, owning it. All signs would indicate that the user really does have CAP_DAC_OVERRIDE over the new descriptor and the protection would be bypassed. We thus opt for the more conservative approach. One final exception is given, which is that non-O_PATH file descriptors are given re-open rights equivalent to the permissions available at open-time. This allows for O_RDONLY file descriptors to be re-opened O_RDWR as long as the user had MAY_WRITE access at the time of opening the O_RDONLY descriptor. This is necessary to avoid breaking userspace (some of the kernel's own selftests depended on this "feature"). Co-developed-by: Andy Lutomirski Co-developed-by: Christian Brauner Signed-off-by: Aleksa Sarai --- fs/internal.h | 1 + fs/namei.c | 117 ++++++++++++++++++++++++++++++++++++++++-- fs/open.c | 3 +- fs/proc/fd.c | 16 ++++-- include/linux/fs.h | 4 ++ include/linux/types.h | 2 +- 6 files changed, 132 insertions(+), 11 deletions(-) diff --git a/fs/internal.h b/fs/internal.h index bfd9bbbe369b..60d8a079a331 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -123,6 +123,7 @@ struct open_flags { int acc_mode; int intent; int lookup_flags; + fmode_t opath_mask; }; extern struct file *do_filp_open(int dfd, struct filename *pathname, const struct open_flags *op); diff --git a/fs/namei.c b/fs/namei.c index 5ebd64b21970..5ff61624b8f0 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -506,6 +506,8 @@ struct nameidata { struct inode *link_inode; unsigned root_seq; int dfd; + fmode_t opath_mask; + int acc_mode; /* op.acc_mode */ } __randomize_layout; static void set_nameidata(struct nameidata *p, int dfd, struct filename *name) @@ -514,7 +516,14 @@ static void set_nameidata(struct nameidata *p, int dfd, struct filename *name) p->stack = p->internal; p->dfd = dfd; p->name = name; - p->total_link_count = old ? old->total_link_count : 0; + p->total_link_count = 0; + p->acc_mode = 0; + p->opath_mask = FMODE_PATH_READ | FMODE_PATH_WRITE; + if (old) { + p->total_link_count = old->total_link_count; + p->acc_mode = old->acc_mode; + p->opath_mask = old->opath_mask; + } p->saved = old; current->nameidata = p; } @@ -1042,8 +1051,40 @@ static int may_create_in_sticky(struct dentry * const dir, return 0; } +/** + * may_reopen_magiclink - Check permissions for opening a trailing magic-link + * @opath_mask: the O_PATH mask of the magiclink + * @acc_mode: ACC_MODE which the user is attempting + * + * We block magic-link re-opening if the @opath_mask is more strict than the + * @acc_mode being requested, unless the user is capable(CAP_DAC_OVERRIDE). + * + * Returns 0 if successful, -ve on error. + */ +static int may_open_magiclink(fmode_t opath_mask, int acc_mode) +{ + /* + * We only allow for init_userns to be able to override magic-links. + * This is done to avoid cases where an unprivileger userns could take + * an O_PATH of the fd, resulting in it being very unclear whether + * CAP_DAC_OVERRIDE should work on the new O_PATH fd (given that it + * pipes through to the underlying file). + */ + if (capable(CAP_DAC_OVERRIDE)) + return 0; + + if ((acc_mode & MAY_READ) && + !(opath_mask & (FMODE_READ | FMODE_PATH_READ))) + return -EACCES; + if ((acc_mode & MAY_WRITE) && + !(opath_mask & (FMODE_WRITE | FMODE_PATH_WRITE))) + return -EACCES; + + return 0; +} + static __always_inline -const char *get_link(struct nameidata *nd) +const char *get_link(struct nameidata *nd, bool trailing) { struct saved *last = nd->stack + nd->depth - 1; struct dentry *dentry = last->link.dentry; @@ -1081,6 +1122,50 @@ const char *get_link(struct nameidata *nd) } else { res = get(dentry, inode, &last->done); } + /* If we just jumped it was because of a magic-link. */ + if (unlikely(nd->flags & LOOKUP_JUMPED)) { + /* + * For trailing_symlink we check whether the symlink's + * mode allows us to do what we want through acc_mode. + * In addition, we need to stash away what the link + * mode is in case we are about to O_PATH this + * magic-link. + * + * This is only done for magic-links, as a security + * measure to prevent users from being able to re-open + * files with additional permissions or similar tricks + * through procfs. This is not strictly POSIX-friendly, + * but technically neither are magic-links. + */ + if (trailing) { + fmode_t opath_mask = 0; + int mode = inode->i_mode; + + /* + * Bare-bones acl_permission_check shift so + * that opath_mask can be adjusted using creds. + */ + if (uid_eq(current_fsuid(), inode->i_uid)) + mode >>= 6; + else if (in_group_p(inode->i_gid)) + mode >>= 3; + + /* Figure out the O_PATH mask. */ + if (mode & MAY_READ) + opath_mask |= FMODE_PATH_READ; + if (mode & MAY_WRITE) + opath_mask |= FMODE_PATH_WRITE; + + /* + * Is the new opath_mask more restrictive than + * the acc_mode being requested? + */ + error = may_open_magiclink(opath_mask, nd->acc_mode); + if (error) + return ERR_PTR(error); + nd->opath_mask &= opath_mask; + } + } if (IS_ERR_OR_NULL(res)) return res; } @@ -2142,7 +2227,7 @@ static int link_path_walk(const char *name, struct nameidata *nd) return err; if (err) { - const char *s = get_link(nd); + const char *s = get_link(nd, false); if (IS_ERR(s)) return PTR_ERR(s); @@ -2258,7 +2343,7 @@ static const char *trailing_symlink(struct nameidata *nd) return ERR_PTR(error); nd->flags |= LOOKUP_PARENT; nd->stack[0].name = NULL; - s = get_link(nd); + s = get_link(nd, true); return s ? s : ""; } @@ -3508,6 +3593,7 @@ static int do_o_path(struct nameidata *nd, unsigned flags, struct file *file) if (!error) { audit_inode(nd->name, path.dentry, 0); error = vfs_open(&path, file); + file->f_mode |= nd->opath_mask; path_put(&path); } return error; @@ -3519,6 +3605,9 @@ static struct file *path_openat(struct nameidata *nd, struct file *file; int error; + nd->acc_mode = op->acc_mode; + nd->opath_mask = op->opath_mask; + file = alloc_empty_file(op->open_flag, current_cred()); if (IS_ERR(file)) return file; @@ -3537,6 +3626,26 @@ static struct file *path_openat(struct nameidata *nd, terminate_walk(nd); } if (likely(!error)) { + /* + * In order to allow for re-opening of a read-only fds as + * read-write (something that some userspace tools depend on, + * including some kernel selftests), we give additional + * permissions based on what permissions are available at + * open-time. This must be scoped to opath_mask to avoid + * obvious O_PATH attacks. + */ + if (likely(!(file->f_mode & FMODE_PATH))) { + fmode_t add_perms = 0; + + if (!(nd->acc_mode & MAY_READ) && + !inode_permission(file->f_inode, MAY_READ)) + add_perms |= FMODE_PATH_READ; + if (!(nd->acc_mode & MAY_WRITE) && + !inode_permission(file->f_inode, MAY_WRITE)) + add_perms |= FMODE_PATH_WRITE; + + file->f_mode |= add_perms & nd->opath_mask; + } if (likely(file->f_mode & FMODE_OPENED)) return file; WARN_ON(1); diff --git a/fs/open.c b/fs/open.c index a00350018a47..c1d4f343e85f 100644 --- a/fs/open.c +++ b/fs/open.c @@ -981,8 +981,9 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o acc_mode |= MAY_APPEND; op->acc_mode = acc_mode; - op->intent = flags & O_PATH ? 0 : LOOKUP_OPEN; + /* For O_PATH backwards-compatibility we default to an all-set mask. */ + op->opath_mask = FMODE_PATH_READ | FMODE_PATH_WRITE; if (flags & O_CREAT) { op->intent |= LOOKUP_CREATE; diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 81882a13212d..8c0ccf2b703b 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -104,11 +104,17 @@ static void tid_fd_update_inode(struct task_struct *task, struct inode *inode, task_dump_owner(task, 0, &inode->i_uid, &inode->i_gid); if (S_ISLNK(inode->i_mode)) { - unsigned i_mode = S_IFLNK; - if (f_mode & FMODE_READ) - i_mode |= S_IRUSR | S_IXUSR; - if (f_mode & FMODE_WRITE) - i_mode |= S_IWUSR | S_IXUSR; + unsigned i_mode = S_IFLNK | S_IXUGO; + /* + * Construct the mode bits based on the open-mode. Note that + * giving o+rwx permissions here is not a problem since all + * /proc/self/fd magic-link resolution is gated with + * ptrace_may_access(). + */ + if (f_mode & (FMODE_READ | FMODE_PATH_READ)) + i_mode |= S_IRUGO; + if (f_mode & (FMODE_WRITE | FMODE_PATH_WRITE)) + i_mode |= S_IWUGO; inode->i_mode = i_mode; } security_task_to_inode(task, inode); diff --git a/include/linux/fs.h b/include/linux/fs.h index fe94c48481a6..0b2af61411cb 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -173,6 +173,10 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, /* File does not contribute to nr_files count */ #define FMODE_NOACCOUNT ((__force fmode_t)0x20000000) +/* File is an O_PATH descriptor which can be upgraded to (read, write). */ +#define FMODE_PATH_READ ((__force fmode_t)0x40000000) +#define FMODE_PATH_WRITE ((__force fmode_t)0x80000000) + /* * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector * that indicates that they should check the contents of the iovec are diff --git a/include/linux/types.h b/include/linux/types.h index cc0dbbe551d5..45aaf711885c 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -157,7 +157,7 @@ typedef u32 dma_addr_t; typedef unsigned int __bitwise gfp_t; typedef unsigned int __bitwise slab_flags_t; -typedef unsigned int __bitwise fmode_t; +typedef unsigned long __bitwise fmode_t; #ifdef CONFIG_PHYS_ADDR_T_64BIT typedef u64 phys_addr_t; From patchwork Mon May 20 13:32:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 10951127 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 145281390 for ; Mon, 20 May 2019 13:34:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 055C6287D3 for ; Mon, 20 May 2019 13:34:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id ED3E12888B; Mon, 20 May 2019 13:34:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 513B928895 for ; Mon, 20 May 2019 13:34:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733194AbfETNeP (ORCPT ); Mon, 20 May 2019 09:34:15 -0400 Received: from mx1.mailbox.org ([80.241.60.212]:57532 "EHLO mx1.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730481AbfETNeN (ORCPT ); Mon, 20 May 2019 09:34:13 -0400 Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx1.mailbox.org (Postfix) with ESMTPS id BDEE750132; Mon, 20 May 2019 15:34:10 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter02.heinlein-hosting.de (spamfilter02.heinlein-hosting.de [80.241.56.116]) (amavisd-new, port 10030) with ESMTP id 2jdKq7uC5Csc; Mon, 20 May 2019 15:34:07 +0200 (CEST) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan Cc: Aleksa Sarai , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Christian Brauner , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Aleksa Sarai , Linus Torvalds , containers@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Subject: [PATCH RFC v8 02/10] procfs: switch magic-link modes to be more sane Date: Mon, 20 May 2019 23:32:57 +1000 Message-Id: <20190520133305.11925-3-cyphar@cyphar.com> In-Reply-To: <20190520133305.11925-1-cyphar@cyphar.com> References: <20190520133305.11925-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now that magic-link modes are obeyed for file re-opening purposes, some of the pre-existing magic-link modes need to be adjusted to be more semantically correct. The most blatant example of this is /proc/self/exe, which had a mode of a+rwx even though tautologically the file could never be opened for writing (because it is the current->mm of a live process). With the new O_PATH restrictions, changing the default mode of these magic-links allows us to avoid delayed-access attacks such as we saw in CVE-2019-5736. Signed-off-by: Aleksa Sarai --- fs/proc/base.c | 20 ++++++++++---------- fs/proc/namespaces.c | 2 +- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 6a803a0b75df..17fd447043ff 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -133,9 +133,9 @@ struct pid_entry { #define DIR(NAME, MODE, iops, fops) \ NOD(NAME, (S_IFDIR|(MODE)), &iops, &fops, {} ) -#define LNK(NAME, get_link) \ - NOD(NAME, (S_IFLNK|S_IRWXUGO), \ - &proc_pid_link_inode_operations, NULL, \ +#define LNK(NAME, MODE, get_link) \ + NOD(NAME, (S_IFLNK|(MODE)), \ + &proc_pid_link_inode_operations, NULL, \ { .proc_get_link = get_link } ) #define REG(NAME, MODE, fops) \ NOD(NAME, (S_IFREG|(MODE)), NULL, &fops, {}) @@ -2995,9 +2995,9 @@ static const struct pid_entry tgid_base_stuff[] = { REG("numa_maps", S_IRUGO, proc_pid_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), - LNK("cwd", proc_cwd_link), - LNK("root", proc_root_link), - LNK("exe", proc_exe_link), + LNK("cwd", S_IRWXUGO, proc_cwd_link), + LNK("root", S_IRWXUGO, proc_root_link), + LNK("exe", S_IRUGO|S_IXUGO, proc_exe_link), REG("mounts", S_IRUGO, proc_mounts_operations), REG("mountinfo", S_IRUGO, proc_mountinfo_operations), REG("mountstats", S_IRUSR, proc_mountstats_operations), @@ -3394,11 +3394,11 @@ static const struct pid_entry tid_base_stuff[] = { REG("numa_maps", S_IRUGO, proc_pid_numa_maps_operations), #endif REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations), - LNK("cwd", proc_cwd_link), - LNK("root", proc_root_link), - LNK("exe", proc_exe_link), + LNK("cwd", S_IRWXUGO, proc_cwd_link), + LNK("root", S_IRWXUGO, proc_root_link), + LNK("exe", S_IRUGO|S_IXUGO, proc_exe_link), REG("mounts", S_IRUGO, proc_mounts_operations), - REG("mountinfo", S_IRUGO, proc_mountinfo_operations), + REG("mountinfo", S_IRUGO, proc_mountinfo_operations), #ifdef CONFIG_PROC_PAGE_MONITOR REG("clear_refs", S_IWUSR, proc_clear_refs_operations), REG("smaps", S_IRUGO, proc_pid_smaps_operations), diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c index dd2b35f78b09..cd1e130913f7 100644 --- a/fs/proc/namespaces.c +++ b/fs/proc/namespaces.c @@ -94,7 +94,7 @@ static struct dentry *proc_ns_instantiate(struct dentry *dentry, struct inode *inode; struct proc_inode *ei; - inode = proc_pid_make_inode(dentry->d_sb, task, S_IFLNK | S_IRWXUGO); + inode = proc_pid_make_inode(dentry->d_sb, task, S_IFLNK | S_IRUGO); if (!inode) return ERR_PTR(-ENOENT); From patchwork Mon May 20 13:32:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 10951129 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1C6D0112C for ; Mon, 20 May 2019 13:34:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0D81D28887 for ; Mon, 20 May 2019 13:34:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0007A2888D; Mon, 20 May 2019 13:34:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 43B9728887 for ; Mon, 20 May 2019 13:34:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387727AbfETNe0 (ORCPT ); Mon, 20 May 2019 09:34:26 -0400 Received: from mx2.mailbox.org ([80.241.60.215]:62748 "EHLO mx2.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387617AbfETNeZ (ORCPT ); Mon, 20 May 2019 09:34:25 -0400 Received: from smtp2.mailbox.org (smtp2.mailbox.org [80.241.60.241]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx2.mailbox.org (Postfix) with ESMTPS id 0C83CA1039; Mon, 20 May 2019 15:34:23 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter06.heinlein-hosting.de (spamfilter06.heinlein-hosting.de [80.241.56.125]) (amavisd-new, port 10030) with ESMTP id sFqK244XqxzI; Mon, 20 May 2019 15:34:20 +0200 (CEST) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan Cc: Aleksa Sarai , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Christian Brauner , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Aleksa Sarai , Linus Torvalds , containers@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Subject: [PATCH RFC v8 03/10] open: O_EMPTYPATH: procfs-less file descriptor re-opening Date: Mon, 20 May 2019 23:32:58 +1000 Message-Id: <20190520133305.11925-4-cyphar@cyphar.com> In-Reply-To: <20190520133305.11925-1-cyphar@cyphar.com> References: <20190520133305.11925-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Userspace has made use of /proc/self/fd very liberally to allow for descriptors to be re-opened. There are a wide variety of uses for this feature, but it has always required constructing a pathname and could not be done without procfs mounted. The obvious solution for this is to extend openat(2) to have an AT_EMPTY_PATH-equivalent -- O_EMPTYPATH. Now that descriptor re-opening has been made safe through the new magic-link resolution restrictions, we can replicate these restrictions for O_EMPTYPATH. In particular, we only allow "upgrading" the file descriptor if the corresponding FMODE_PATH_* bit is set (or the FMODE_{READ,WRITE} cases for non-O_PATH file descriptors). When doing openat(O_EMPTYPATH|O_PATH), O_PATH takes precedence and O_EMPTYPATH is ignored. Very few users ever have a need to O_PATH re-open an existing file descriptor, and so accommodating them at the expense of further complicating O_PATH makes little sense. Ultimately, if users ask for this we can always add RESOLVE_EMPTY_PATH to resolveat(2) in the future. Signed-off-by: Aleksa Sarai --- fs/fcntl.c | 2 +- fs/namei.c | 27 +++++++++++++++++++++++++++ fs/open.c | 7 ++++++- include/linux/fcntl.h | 2 +- include/uapi/asm-generic/fcntl.h | 5 +++++ 5 files changed, 40 insertions(+), 3 deletions(-) diff --git a/fs/fcntl.c b/fs/fcntl.c index 083185174c6d..6c85c4d0c006 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -1031,7 +1031,7 @@ static int __init fcntl_init(void) * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY * is defined as O_NONBLOCK on some platforms and not on others. */ - BUILD_BUG_ON(21 - 1 /* for O_RDONLY being 0 */ != + BUILD_BUG_ON(22 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32( (VALID_OPEN_FLAGS & ~(O_NONBLOCK | O_NDELAY)) | __FMODE_EXEC | __FMODE_NONOTIFY)); diff --git a/fs/namei.c b/fs/namei.c index 5ff61624b8f0..46c5302eea4a 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3599,6 +3599,31 @@ static int do_o_path(struct nameidata *nd, unsigned flags, struct file *file) return error; } +static int do_o_emptypath(struct nameidata *nd, struct file *newfile) +{ + int error; + struct fd f; + + /* We don't support AT_FDCWD since O_PATH cannot be set here. */ + f = fdget_raw(nd->dfd); + if (!f.file) + return -EBADF; + + /* Update opath_mask as though we went through trailing_symlink(). */ + if (!(f.file->f_mode & (FMODE_READ | FMODE_PATH_READ))) + nd->opath_mask &= ~FMODE_PATH_READ; + if (!(f.file->f_mode & (FMODE_WRITE | FMODE_PATH_WRITE))) + nd->opath_mask &= ~FMODE_PATH_WRITE; + + /* Obey the same restrictions as magic-links. */ + error = may_open_magiclink(f.file->f_mode, nd->acc_mode); + if (!error) + error = vfs_open(&f.file->f_path, newfile); + + fdput(f); + return error; +} + static struct file *path_openat(struct nameidata *nd, const struct open_flags *op, unsigned flags) { @@ -3614,6 +3639,8 @@ static struct file *path_openat(struct nameidata *nd, if (unlikely(file->f_flags & __O_TMPFILE)) { error = do_tmpfile(nd, flags, op, file); + } else if (unlikely(file->f_flags & O_EMPTYPATH)) { + error = do_o_emptypath(nd, file); } else if (unlikely(file->f_flags & O_PATH)) { error = do_o_path(nd, flags, file); } else { diff --git a/fs/open.c b/fs/open.c index c1d4f343e85f..a6ccfea899b9 100644 --- a/fs/open.c +++ b/fs/open.c @@ -995,6 +995,8 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o lookup_flags |= LOOKUP_DIRECTORY; if (!(flags & O_NOFOLLOW)) lookup_flags |= LOOKUP_FOLLOW; + if (flags & O_EMPTYPATH) + lookup_flags |= LOOKUP_EMPTY; op->lookup_flags = lookup_flags; return 0; } @@ -1056,14 +1058,17 @@ long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode) { struct open_flags op; int fd = build_open_flags(flags, mode, &op); + int empty = 0; struct filename *tmp; if (fd) return fd; - tmp = getname(filename); + tmp = getname_flags(filename, op.lookup_flags, &empty); if (IS_ERR(tmp)) return PTR_ERR(tmp); + if (!empty) + op.open_flag &= ~O_EMPTYPATH; fd = get_unused_fd_flags(flags); if (fd >= 0) { diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h index d019df946cb2..2868ae6c8fc1 100644 --- a/include/linux/fcntl.h +++ b/include/linux/fcntl.h @@ -9,7 +9,7 @@ (O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC | \ O_APPEND | O_NDELAY | O_NONBLOCK | O_NDELAY | __O_SYNC | O_DSYNC | \ FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \ - O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE) + O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE | O_EMPTYPATH) #ifndef force_o_largefile #define force_o_largefile() (!IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T)) diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h index 9dc0bf0c5a6e..307f7c414a51 100644 --- a/include/uapi/asm-generic/fcntl.h +++ b/include/uapi/asm-generic/fcntl.h @@ -89,6 +89,11 @@ #define __O_TMPFILE 020000000 #endif +#ifndef O_EMPTYPATH +#define O_EMPTYPATH 040000000 +#endif + + /* a horrid kludge trying to make sure that this will fail on old kernels */ #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY) #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT) From patchwork Mon May 20 13:32:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 10951135 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 581381390 for ; Mon, 20 May 2019 13:34:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 49BEA2873A for ; Mon, 20 May 2019 13:34:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3DD3B28895; Mon, 20 May 2019 13:34:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CC8742888C for ; Mon, 20 May 2019 13:34:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732090AbfETNel (ORCPT ); Mon, 20 May 2019 09:34:41 -0400 Received: from mx1.mailbox.org ([80.241.60.212]:59666 "EHLO mx1.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728634AbfETNek (ORCPT ); Mon, 20 May 2019 09:34:40 -0400 Received: from smtp2.mailbox.org (smtp2.mailbox.org [80.241.60.241]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx1.mailbox.org (Postfix) with ESMTPS id 7072950251; Mon, 20 May 2019 15:34:37 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter06.heinlein-hosting.de (spamfilter06.heinlein-hosting.de [80.241.56.125]) (amavisd-new, port 10030) with ESMTP id u25V7i0GyP7E; Mon, 20 May 2019 15:34:32 +0200 (CEST) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan Cc: Aleksa Sarai , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Christian Brauner , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Aleksa Sarai , Linus Torvalds , containers@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Subject: [PATCH RFC v8 04/10] namei: split out nd->dfd handling to dirfd_path_init Date: Mon, 20 May 2019 23:32:59 +1000 Message-Id: <20190520133305.11925-5-cyphar@cyphar.com> In-Reply-To: <20190520133305.11925-1-cyphar@cyphar.com> References: <20190520133305.11925-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Previously, path_init's handling of *at(dfd, ...) was only done once, but with LOOKUP_BENEATH (and LOOKUP_IN_ROOT) we have to parse the initial nd->path at different times (before or after absolute path handling) depending on whether we have been asked to scope resolution within a root. Signed-off-by: Aleksa Sarai --- fs/namei.c | 103 ++++++++++++++++++++++++++++++----------------------- 1 file changed, 59 insertions(+), 44 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 46c5302eea4a..dc323e25d4d2 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2251,9 +2251,59 @@ static int link_path_walk(const char *name, struct nameidata *nd) } } +/* + * Configure nd->path based on the nd->dfd. This is only used as part of + * path_init(). + */ +static inline int dirfd_path_init(struct nameidata *nd) +{ + if (nd->dfd == AT_FDCWD) { + if (nd->flags & LOOKUP_RCU) { + struct fs_struct *fs = current->fs; + unsigned seq; + + do { + seq = read_seqcount_begin(&fs->seq); + nd->path = fs->pwd; + nd->inode = nd->path.dentry->d_inode; + nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq); + } while (read_seqcount_retry(&fs->seq, seq)); + } else { + get_fs_pwd(current->fs, &nd->path); + nd->inode = nd->path.dentry->d_inode; + } + } else { + /* Caller must check execute permissions on the starting path component */ + struct fd f = fdget_raw(nd->dfd); + struct dentry *dentry; + + if (!f.file) + return -EBADF; + + dentry = f.file->f_path.dentry; + + if (*nd->name->name && unlikely(!d_can_lookup(dentry))) { + fdput(f); + return -ENOTDIR; + } + + nd->path = f.file->f_path; + if (nd->flags & LOOKUP_RCU) { + nd->inode = nd->path.dentry->d_inode; + nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq); + } else { + path_get(&nd->path); + nd->inode = nd->path.dentry->d_inode; + } + fdput(f); + } + return 0; +} + /* must be paired with terminate_walk() */ static const char *path_init(struct nameidata *nd, unsigned flags) { + int error; const char *s = nd->name->name; if (!*s) @@ -2287,52 +2337,17 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->m_seq = read_seqbegin(&mount_lock); if (*s == '/') { - set_root(nd); - if (likely(!nd_jump_root(nd))) - return s; - return ERR_PTR(-ECHILD); - } else if (nd->dfd == AT_FDCWD) { - if (flags & LOOKUP_RCU) { - struct fs_struct *fs = current->fs; - unsigned seq; - - do { - seq = read_seqcount_begin(&fs->seq); - nd->path = fs->pwd; - nd->inode = nd->path.dentry->d_inode; - nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq); - } while (read_seqcount_retry(&fs->seq, seq)); - } else { - get_fs_pwd(current->fs, &nd->path); - nd->inode = nd->path.dentry->d_inode; - } - return s; - } else { - /* Caller must check execute permissions on the starting path component */ - struct fd f = fdget_raw(nd->dfd); - struct dentry *dentry; - - if (!f.file) - return ERR_PTR(-EBADF); - - dentry = f.file->f_path.dentry; - - if (*s && unlikely(!d_can_lookup(dentry))) { - fdput(f); - return ERR_PTR(-ENOTDIR); - } - - nd->path = f.file->f_path; - if (flags & LOOKUP_RCU) { - nd->inode = nd->path.dentry->d_inode; - nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq); - } else { - path_get(&nd->path); - nd->inode = nd->path.dentry->d_inode; - } - fdput(f); + if (likely(!nd->root.mnt)) + set_root(nd); + error = nd_jump_root(nd); + if (unlikely(error)) + s = ERR_PTR(error); return s; } + error = dirfd_path_init(nd); + if (unlikely(error)) + return ERR_PTR(error); + return s; } static const char *trailing_symlink(struct nameidata *nd) From patchwork Mon May 20 13:33:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 10951141 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 585711390 for ; Mon, 20 May 2019 13:35:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 462CB28574 for ; Mon, 20 May 2019 13:35:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3A00028895; Mon, 20 May 2019 13:35:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 64B152886A for ; Mon, 20 May 2019 13:35:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388236AbfETNfG (ORCPT ); Mon, 20 May 2019 09:35:06 -0400 Received: from mx1.mailbox.org ([80.241.60.212]:60968 "EHLO mx1.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388042AbfETNfF (ORCPT ); Mon, 20 May 2019 09:35:05 -0400 Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx1.mailbox.org (Postfix) with ESMTPS id 12CE64EC6D; Mon, 20 May 2019 15:34:58 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by gerste.heinlein-support.de (gerste.heinlein-support.de [91.198.250.173]) (amavisd-new, port 10030) with ESMTP id buBp-BZaArAW; Mon, 20 May 2019 15:34:46 +0200 (CEST) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan Cc: Aleksa Sarai , Christian Brauner , David Drysdale , Andy Lutomirski , Linus Torvalds , Eric Biederman , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Tycho Andersen , Chanho Min , Oleg Nesterov , Aleksa Sarai , containers@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Subject: [PATCH RFC v8 05/10] namei: O_BENEATH-style path resolution flags Date: Mon, 20 May 2019 23:33:00 +1000 Message-Id: <20190520133305.11925-6-cyphar@cyphar.com> In-Reply-To: <20190520133305.11925-1-cyphar@cyphar.com> References: <20190520133305.11925-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add the following flags to allow various restrictions on path resolution (these affect the *entire* resolution, rather than just the final path component -- as is the case with most other AT_* flags). The primary justification for these flags is to allow for programs to be far more strict about how they want path resolution to handle symlinks, mountpoint crossings, and paths that escape the dirfd (through an absolute path or ".." shenanigans). This is of particular concern to container runtimes that want to be very careful about malicious root filesystems that a container's init might have screwed around with (and there is no real way to protect against this in userspace if you consider potential races against a malicious container's init). More classical applications (which have their own potentially buggy userspace path sanitisation code) include web servers, archive extraction tools, network file servers, and so on. These flags are exposed to userspace in a later patchset. * LOOKUP_XDEV: Disallow mount-point crossing (both *down* into one, or *up* from one). The primary "scoping" use is to blocking resolution that crosses a bind-mount, which has a similar property to a symlink (in the way that it allows for escape from the starting-point). Since it is not possible to differentiate bind-mounts However since bind-mounting requires privileges (in ways symlinks don't) this has been split from LOOKUP_BENEATH. The naming is based on "find -xdev" as well as -EXDEV (though find(1) doesn't walk upwards, the semantics seem obvious). * LOOKUP_NO_MAGICLINKS: Disallows ->get_link "symlink" jumping. This is a very specific restriction, and it exists because /proc/$pid/fd/... "symlinks" allow for access outside nd->root and pose risk to container runtimes that don't want to be tricked into accessing a host path (but do want to allow no-funny-business symlink resolution). * LOOKUP_NO_SYMLINKS: Disallows symlink jumping *of any kind*. Implies LOOKUP_NO_MAGICLINKS (obviously). * LOOKUP_BENEATH: Disallow "escapes" from the starting point of the filesystem tree during resolution (you must stay "beneath" the starting point at all times). Currently this is done by disallowing ".." and absolute paths (either in the given path or found during symlink resolution) entirely, as well as all magic-link jumping. The wholesale banning of ".." is because it is currently not safe to allow ".." resolution (races can cause the path to be moved outside of the root -- this is conceptually similar to historical chroot(2) escape attacks). Future patches in this series will address this, and will re-enable ".." resolution once it is safe. With those patches, ".." resolution will only be allowed if it remains in the root throughout resolution (such as "a/../b" not "a/../../outside/b"). The banning of magic-link jumping is done because it is not clear whether semantically they should be allowed -- while some magic-links are safe there are many that can cause escapes (and once a resolution is outside of the root, O_BENEATH will no longer detect it). Future patches may re-enable magic-link jumping when such jumps would remain inside the root. The LOOKUP_NO_*LINK flags return -ELOOP if path resolution would violates their requirement, while the others all return -EXDEV. This is a refresh of Al's AT_NO_JUMPS patchset[1] (which was a variation on David Drysdale's O_BENEATH patchset[2], which in turn was based on the Capsicum project[3]). Input from Linus and Andy in the AT_NO_JUMPS thread[4] determined most of the API changes made in this refresh. [1]: https://lwn.net/Articles/721443/ [2]: https://lwn.net/Articles/619151/ [3]: https://lwn.net/Articles/603929/ [4]: https://lwn.net/Articles/723057/ Cc: Christian Brauner Suggested-by: David Drysdale Suggested-by: Al Viro Suggested-by: Andy Lutomirski Suggested-by: Linus Torvalds Signed-off-by: Aleksa Sarai --- fs/namei.c | 74 ++++++++++++++++++++++++++++++++++++------- include/linux/namei.h | 7 ++++ 2 files changed, 70 insertions(+), 11 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index dc323e25d4d2..f997c82eb9c2 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -852,6 +852,13 @@ static inline void path_to_nameidata(const struct path *path, static int nd_jump_root(struct nameidata *nd) { + if (unlikely(nd->flags & LOOKUP_BENEATH)) + return -EXDEV; + if (unlikely(nd->flags & LOOKUP_XDEV)) { + /* Absolute path arguments to path_init() are allowed. */ + if (nd->path.mnt != NULL && nd->path.mnt != nd->root.mnt) + return -EXDEV; + } if (nd->flags & LOOKUP_RCU) { struct dentry *d; nd->path = nd->root; @@ -1092,6 +1099,9 @@ const char *get_link(struct nameidata *nd, bool trailing) int error; const char *res; + if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS)) + return ERR_PTR(-ELOOP); + if (!(nd->flags & LOOKUP_RCU)) { touch_atime(&last->link); cond_resched(); @@ -1124,6 +1134,11 @@ const char *get_link(struct nameidata *nd, bool trailing) } /* If we just jumped it was because of a magic-link. */ if (unlikely(nd->flags & LOOKUP_JUMPED)) { + if (unlikely(nd->flags & LOOKUP_NO_MAGICLINKS)) + return ERR_PTR(-ELOOP); + /* Not currently safe. */ + if (unlikely(nd->flags & LOOKUP_BENEATH)) + return ERR_PTR(-EXDEV); /* * For trailing_symlink we check whether the symlink's * mode allows us to do what we want through acc_mode. @@ -1172,8 +1187,9 @@ const char *get_link(struct nameidata *nd, bool trailing) if (*res == '/') { if (!nd->root.mnt) set_root(nd); - if (unlikely(nd_jump_root(nd))) - return ERR_PTR(-ECHILD); + error = nd_jump_root(nd); + if (unlikely(error)) + return ERR_PTR(error); while (unlikely(*++res == '/')) ; } @@ -1354,12 +1370,16 @@ static int follow_managed(struct path *path, struct nameidata *nd) break; } - if (need_mntput && path->mnt == mnt) - mntput(path->mnt); + if (need_mntput) { + if (path->mnt == mnt) + mntput(path->mnt); + if (unlikely(nd->flags & LOOKUP_XDEV)) + ret = -EXDEV; + else + nd->flags |= LOOKUP_JUMPED; + } if (ret == -EISDIR || !ret) ret = 1; - if (need_mntput) - nd->flags |= LOOKUP_JUMPED; if (unlikely(ret < 0)) path_put_conditional(path, nd); return ret; @@ -1416,6 +1436,8 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path, mounted = __lookup_mnt(path->mnt, path->dentry); if (!mounted) break; + if (unlikely(nd->flags & LOOKUP_XDEV)) + return false; path->mnt = &mounted->mnt; path->dentry = mounted->mnt.mnt_root; nd->flags |= LOOKUP_JUMPED; @@ -1436,8 +1458,11 @@ static int follow_dotdot_rcu(struct nameidata *nd) struct inode *inode = nd->inode; while (1) { - if (path_equal(&nd->path, &nd->root)) + if (path_equal(&nd->path, &nd->root)) { + if (unlikely(nd->flags & LOOKUP_BENEATH)) + return -EXDEV; break; + } if (nd->path.dentry != nd->path.mnt->mnt_root) { struct dentry *old = nd->path.dentry; struct dentry *parent = old->d_parent; @@ -1462,6 +1487,8 @@ static int follow_dotdot_rcu(struct nameidata *nd) return -ECHILD; if (&mparent->mnt == nd->path.mnt) break; + if (unlikely(nd->flags & LOOKUP_XDEV)) + return -EXDEV; /* we know that mountpoint was pinned */ nd->path.dentry = mountpoint; nd->path.mnt = &mparent->mnt; @@ -1476,6 +1503,8 @@ static int follow_dotdot_rcu(struct nameidata *nd) return -ECHILD; if (!mounted) break; + if (unlikely(nd->flags & LOOKUP_XDEV)) + return -EXDEV; nd->path.mnt = &mounted->mnt; nd->path.dentry = mounted->mnt.mnt_root; inode = nd->path.dentry->d_inode; @@ -1564,8 +1593,11 @@ static int path_parent_directory(struct path *path) static int follow_dotdot(struct nameidata *nd) { while(1) { - if (path_equal(&nd->path, &nd->root)) + if (path_equal(&nd->path, &nd->root)) { + if (unlikely(nd->flags & LOOKUP_BENEATH)) + return -EXDEV; break; + } if (nd->path.dentry != nd->path.mnt->mnt_root) { int ret = path_parent_directory(&nd->path); if (ret) @@ -1574,6 +1606,8 @@ static int follow_dotdot(struct nameidata *nd) } if (!follow_up(&nd->path)) break; + if (unlikely(nd->flags & LOOKUP_XDEV)) + return -EXDEV; } follow_mount(&nd->path); nd->inode = nd->path.dentry->d_inode; @@ -1788,6 +1822,13 @@ static inline int may_lookup(struct nameidata *nd) static inline int handle_dots(struct nameidata *nd, int type) { if (type == LAST_DOTDOT) { + /* + * LOOKUP_BENEATH resolving ".." is not currently safe -- races can + * cause our parent to have moved outside of the root and us to skip + * over it. + */ + if (unlikely(nd->flags & LOOKUP_BENEATH)) + return -EXDEV; if (!nd->root.mnt) set_root(nd); if (nd->flags & LOOKUP_RCU) { @@ -2336,6 +2377,15 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->path.dentry = NULL; nd->m_seq = read_seqbegin(&mount_lock); + + if (unlikely(nd->flags & LOOKUP_BENEATH)) { + error = dirfd_path_init(nd); + if (unlikely(error)) + return ERR_PTR(error); + nd->root = nd->path; + if (!(nd->flags & LOOKUP_RCU)) + path_get(&nd->root); + } if (*s == '/') { if (likely(!nd->root.mnt)) set_root(nd); @@ -2344,9 +2394,11 @@ static const char *path_init(struct nameidata *nd, unsigned flags) s = ERR_PTR(error); return s; } - error = dirfd_path_init(nd); - if (unlikely(error)) - return ERR_PTR(error); + if (likely(!nd->path.mnt)) { + error = dirfd_path_init(nd); + if (unlikely(error)) + return ERR_PTR(error); + } return s; } diff --git a/include/linux/namei.h b/include/linux/namei.h index 9138b4471dbf..7bc819ad0cd3 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -50,6 +50,13 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND}; #define LOOKUP_EMPTY 0x4000 #define LOOKUP_DOWN 0x8000 +/* Scoping flags for lookup. */ +#define LOOKUP_BENEATH 0x010000 /* No escaping from starting point. */ +#define LOOKUP_XDEV 0x020000 /* No mountpoint crossing. */ +#define LOOKUP_NO_MAGICLINKS 0x040000 /* No /proc/$pid/fd/ "symlink" crossing. */ +#define LOOKUP_NO_SYMLINKS 0x080000 /* No symlink crossing *at all*. + Implies LOOKUP_NO_MAGICLINKS. */ + extern int path_pts(struct path *path); extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty); From patchwork Mon May 20 13:33:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 10951137 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4D764912 for ; Mon, 20 May 2019 13:35:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3D49E2873A for ; Mon, 20 May 2019 13:35:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 30D91287F9; Mon, 20 May 2019 13:35:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 157B0287E9 for ; Mon, 20 May 2019 13:35:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388043AbfETNfD (ORCPT ); Mon, 20 May 2019 09:35:03 -0400 Received: from mx1.mailbox.org ([80.241.60.212]:61162 "EHLO mx1.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728634AbfETNfD (ORCPT ); Mon, 20 May 2019 09:35:03 -0400 Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx1.mailbox.org (Postfix) with ESMTPS id C877750122; Mon, 20 May 2019 15:35:00 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter06.heinlein-hosting.de (spamfilter06.heinlein-hosting.de [80.241.56.125]) (amavisd-new, port 10030) with ESMTP id csTDFD4X9caf; Mon, 20 May 2019 15:34:58 +0200 (CEST) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan Cc: Aleksa Sarai , Christian Brauner , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Aleksa Sarai , Linus Torvalds , containers@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Subject: [PATCH RFC v8 06/10] namei: LOOKUP_IN_ROOT: chroot-like path resolution Date: Mon, 20 May 2019 23:33:01 +1000 Message-Id: <20190520133305.11925-7-cyphar@cyphar.com> In-Reply-To: <20190520133305.11925-1-cyphar@cyphar.com> References: <20190520133305.11925-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The primary motivation for the need for this flag is container runtimes which have to interact with malicious root filesystems in the host namespaces. One of the first requirements for a container runtime to be secure against a malicious rootfs is that they correctly scope symlinks (that is, they should be scoped as though they are chroot(2)ed into the container's rootfs) and ".."-style paths[*]. The already-existing LOOKUP_XDEV and LOOKUP_NO_MAGICLINKS help defend against other potential attacks in a malicious rootfs scenario. Currently most container runtimes try to do this resolution in userspace[1], causing many potential race conditions. In addition, the "obvious" alternative (actually performing a {ch,pivot_}root(2)) requires a fork+exec (for some runtimes) which is *very* costly if necessary for every filesystem operation involving a container. [*] At the moment, ".." and magic-link jumping are disallowed for the same reason it is disabled for LOOKUP_BENEATH -- currently it is not safe to allow it. Future patches may enable it unconditionally once we have resolved the possible races (for "..") and semantics (for magic-link jumping). The most significant *at(2) semantic change with LOOKUP_IN_ROOT is that absolute pathnames no longer cause dirfd to be ignored completely. The rationale is that LOOKUP_IN_ROOT must necessarily chroot-scope symlinks with absolute paths to dirfd, and so doing it for the base path seems to be the most consistent behaviour (and also avoids foot-gunning users who want to scope paths that are absolute). [1]: https://github.com/cyphar/filepath-securejoin Co-developed-by: Christian Brauner Signed-off-by: Aleksa Sarai --- fs/namei.c | 6 +++--- include/linux/namei.h | 1 + 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index f997c82eb9c2..d18671a06bdb 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1137,7 +1137,7 @@ const char *get_link(struct nameidata *nd, bool trailing) if (unlikely(nd->flags & LOOKUP_NO_MAGICLINKS)) return ERR_PTR(-ELOOP); /* Not currently safe. */ - if (unlikely(nd->flags & LOOKUP_BENEATH)) + if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT))) return ERR_PTR(-EXDEV); /* * For trailing_symlink we check whether the symlink's @@ -1827,7 +1827,7 @@ static inline int handle_dots(struct nameidata *nd, int type) * cause our parent to have moved outside of the root and us to skip * over it. */ - if (unlikely(nd->flags & LOOKUP_BENEATH)) + if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT))) return -EXDEV; if (!nd->root.mnt) set_root(nd); @@ -2378,7 +2378,7 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->m_seq = read_seqbegin(&mount_lock); - if (unlikely(nd->flags & LOOKUP_BENEATH)) { + if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT))) { error = dirfd_path_init(nd); if (unlikely(error)) return ERR_PTR(error); diff --git a/include/linux/namei.h b/include/linux/namei.h index 7bc819ad0cd3..4b1ee717cb14 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -56,6 +56,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND}; #define LOOKUP_NO_MAGICLINKS 0x040000 /* No /proc/$pid/fd/ "symlink" crossing. */ #define LOOKUP_NO_SYMLINKS 0x080000 /* No symlink crossing *at all*. Implies LOOKUP_NO_MAGICLINKS. */ +#define LOOKUP_IN_ROOT 0x100000 /* Treat dirfd as %current->fs->root. */ extern int path_pts(struct path *path); From patchwork Mon May 20 13:33:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 10951145 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 66E33912 for ; Mon, 20 May 2019 13:35:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5497428882 for ; Mon, 20 May 2019 13:35:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 522152888E; Mon, 20 May 2019 13:35:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A06C628882 for ; Mon, 20 May 2019 13:35:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388413AbfETNfQ (ORCPT ); Mon, 20 May 2019 09:35:16 -0400 Received: from mx1.mailbox.org ([80.241.60.212]:61828 "EHLO mx1.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731029AbfETNfQ (ORCPT ); Mon, 20 May 2019 09:35:16 -0400 Received: from smtp2.mailbox.org (smtp2.mailbox.org [80.241.60.241]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx1.mailbox.org (Postfix) with ESMTPS id 99D644FCEB; Mon, 20 May 2019 15:35:12 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter06.heinlein-hosting.de (spamfilter06.heinlein-hosting.de [80.241.56.125]) (amavisd-new, port 10030) with ESMTP id 1xk0cINnSUEo; Mon, 20 May 2019 15:35:10 +0200 (CEST) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan Cc: Aleksa Sarai , Jann Horn , Kees Cook , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Christian Brauner , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Aleksa Sarai , Linus Torvalds , containers@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Subject: [PATCH RFC v8 07/10] namei: aggressively check for nd->root escape on ".." resolution Date: Mon, 20 May 2019 23:33:02 +1000 Message-Id: <20190520133305.11925-8-cyphar@cyphar.com> In-Reply-To: <20190520133305.11925-1-cyphar@cyphar.com> References: <20190520133305.11925-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch allows for LOOKUP_BENEATH and LOOKUP_IN_ROOT to safely permit ".." resolution (in the case of LOOKUP_BENEATH the resolution will still fail if ".." resolution would resolve a path outside of the root -- while LOOKUP_IN_ROOT will chroot(2)-style scope it). magic-link jumps are still disallowed entirely because now they could result in inconsistent behaviour if resolution encounters a subsequent "..". The need for this patch is explained by observing there is a fairly easy-to-exploit race condition with chroot(2) (and thus by extension LOOKUP_IN_ROOT and LOOKUP_BENEATH if ".." is allowed) where a rename(2) of a path can be used to "skip over" nd->root and thus escape to the filesystem above nd->root. thread1 [attacker]: for (;;) renameat2(AT_FDCWD, "/a/b/c", AT_FDCWD, "/a/d", RENAME_EXCHANGE); thread2 [victim]: for (;;) resolveat(dirb, "b/c/../../etc/shadow", RESOLVE_IN_ROOT); With fairly significant regularity, thread2 will resolve to "/etc/shadow" rather than "/a/b/etc/shadow". There is also a similar (though somewhat more privileged) attack using MS_MOVE. With this patch, such cases will be detected *during* ".." resolution (which is the weak point of chroot(2) -- since walking *into* a subdirectory tautologically cannot result in you walking *outside* nd->root -- except through a bind-mount or magic-link). By detecting this at ".." resolution (rather than checking only at the end of the entire resolution) we can both correct escapes by jumping back to the root (in the case of LOOKUP_IN_ROOT), as well as avoid revealing to attackers the structure of the filesystem outside of the root (through timing attacks for instance). In order to avoid a quadratic lookup with each ".." entry, we only activate the slow path if a write through &rename_lock or &mount_lock has occurred during path resolution (&rename_lock and &mount_lock are re-taken to further optimise the lookup). Since the primary attack being protected against is MS_MOVE or rename(2), not doing additional checks unless a mount or rename have occurred avoids making the common case slow. The use of path_is_under() here might seem suspect, but on further inspection of the most important race (a path was *inside* the root but is now *outside*), there appears to be no attack potential: * If path_is_under() occurs before the rename, then the path will be resolved -- however the path was originally inside the root and thus there is no escape (and to userspace it'd look like the rename occurred after the path was resolved). If path_is_under() occurs afterwards, the resolution is blocked. * Subsequent ".." jumps are guaranteed to check path_is_under() -- by construction, &rename_lock or &mount_lock must have been taken by the attacker after path_is_under() returned in the victim. Thus ".." will not be able to escape from the previously-inside-root path. * Walking down in the moved path is still safe since the entire subtree was moved (either by rename(2) or MS_MOVE) and because (as discussed above) walking down is safe. I have run a variant of the above attack in a loop on several machines with this patch, and no instances of a breakout were detected. While this is not concrete proof that this is safe, when combined with the above argument it should lend some trustworthiness to this construction. Cc: Al Viro Cc: Jann Horn Cc: Kees Cook Signed-off-by: Aleksa Sarai --- fs/namei.c | 48 +++++++++++++++++++++++++++++++++--------------- 1 file changed, 33 insertions(+), 15 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index d18671a06bdb..6c3bbe627965 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -491,7 +491,7 @@ struct nameidata { struct path root; struct inode *inode; /* path.dentry.d_inode */ unsigned int flags; - unsigned seq, m_seq; + unsigned seq, m_seq, r_seq; int last_type; unsigned depth; int total_link_count; @@ -1822,19 +1822,35 @@ static inline int may_lookup(struct nameidata *nd) static inline int handle_dots(struct nameidata *nd, int type) { if (type == LAST_DOTDOT) { - /* - * LOOKUP_BENEATH resolving ".." is not currently safe -- races can - * cause our parent to have moved outside of the root and us to skip - * over it. - */ - if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT))) - return -EXDEV; + int error = 0; + if (!nd->root.mnt) set_root(nd); - if (nd->flags & LOOKUP_RCU) { - return follow_dotdot_rcu(nd); - } else - return follow_dotdot(nd); + if (nd->flags & LOOKUP_RCU) + error = follow_dotdot_rcu(nd); + else + error = follow_dotdot(nd); + if (error) + return error; + + if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT))) { + bool m_retry = read_seqretry(&mount_lock, nd->m_seq); + bool r_retry = read_seqretry(&rename_lock, nd->r_seq); + + /* + * Don't bother checking unless there's a racing + * rename(2) or MS_MOVE. + */ + if (likely(!m_retry && !r_retry)) + return 0; + + if (m_retry && !(nd->flags & LOOKUP_RCU)) + nd->m_seq = read_seqbegin(&mount_lock); + if (r_retry) + nd->r_seq = read_seqbegin(&rename_lock); + if (!path_is_under(&nd->path, &nd->root)) + return -EXDEV; + } } return 0; } @@ -2355,6 +2371,11 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->last_type = LAST_ROOT; /* if there are only slashes... */ nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT; nd->depth = 0; + + nd->m_seq = read_seqbegin(&mount_lock); + if (unlikely(flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT))) + nd->r_seq = read_seqbegin(&rename_lock); + if (flags & LOOKUP_ROOT) { struct dentry *root = nd->root.dentry; struct inode *inode = root->d_inode; @@ -2365,7 +2386,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) if (flags & LOOKUP_RCU) { nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq); nd->root_seq = nd->seq; - nd->m_seq = read_seqbegin(&mount_lock); } else { path_get(&nd->path); } @@ -2376,8 +2396,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->path.mnt = NULL; nd->path.dentry = NULL; - nd->m_seq = read_seqbegin(&mount_lock); - if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT))) { error = dirfd_path_init(nd); if (unlikely(error)) From patchwork Mon May 20 13:33:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 10951149 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AB98F912 for ; Mon, 20 May 2019 13:35:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 99FE62888E for ; Mon, 20 May 2019 13:35:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8E00828852; Mon, 20 May 2019 13:35:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 07E9428831 for ; Mon, 20 May 2019 13:35:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730039AbfETNf2 (ORCPT ); Mon, 20 May 2019 09:35:28 -0400 Received: from mx1.mailbox.org ([80.241.60.212]:62706 "EHLO mx1.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730269AbfETNf2 (ORCPT ); Mon, 20 May 2019 09:35:28 -0400 Received: from smtp2.mailbox.org (smtp2.mailbox.org [80.241.60.241]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx1.mailbox.org (Postfix) with ESMTPS id 56C9750C9B; Mon, 20 May 2019 15:35:25 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter06.heinlein-hosting.de (spamfilter06.heinlein-hosting.de [80.241.56.125]) (amavisd-new, port 10030) with ESMTP id 8It-CNw9qGUe; Mon, 20 May 2019 15:35:21 +0200 (CEST) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan Cc: Aleksa Sarai , Christian Brauner , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Aleksa Sarai , Linus Torvalds , containers@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Subject: [PATCH RFC v8 08/10] namei: resolveat(2) syscall Date: Mon, 20 May 2019 23:33:03 +1000 Message-Id: <20190520133305.11925-9-cyphar@cyphar.com> In-Reply-To: <20190520133305.11925-1-cyphar@cyphar.com> References: <20190520133305.11925-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The most obvious syscall to add support for the new LOOKUP_* scoping flags would be openat(2) (along with the required execveat(2) change included in this series). However, there are a few reasons to not do this: * The new LOOKUP_* flags are intended to be security features, and openat(2) will silently ignore all unknown flags. This means that users would need to avoid foot-gunning themselves constantly when using this interface if it were part of openat(2). * Resolution scoping feels like a different operation to the existing O_* flags. And since openat(2) has limited flag space, it seems to be quite wasteful to clutter it with 5 flags that are all resolution-related. Arguably O_NOFOLLOw is also a resolution flag but its entire purpose is to error out if you encounter a trailing symlink not to scope resolution. * Other systems would be able to reimplement this syscall allowing for cross-OS standardisation rather than being hidden amongst O_* flags which may result in it not being used by all the parties that might want to use it (file servers, web servers, container runtimes, etc). * It gives us the opportunity to iterate on the O_PATH interface. In particular, the RESOLVE_UPGRADE_* flags are only possible because we have a clean slate without needing to re-use the ACC_MODE flag design. To this end, we introduce the resolveat(2) syscall. At the moment it's effectively another way of getting a bog-standard O_PATH descriptor but with the ability to use the new LOOKUP_* flags. Following the example of other new file-handle-creation syscalls (pidfds, seccomp notify fds) we default to O_CLOEXEC and users can unset O_CLOEXEC using fcntl(2) if they really want to. Because resolveat(2) only provides the ability to get O_PATH descriptors, users will need to get creative with /proc/self/fd in order to get a usable file descriptor for other uses. However, the new addition of O_EMPTYPATH shortcuts this problem and allows for easy usage of the new LOOKUP_* flags with openat(2) without cluttering the open(2) flag bits. In order to allow for userspace to lock down their usage of file descriptor re-opening, resolveat(2) has flags (RESOLVE_UPGRADE_*) which allow users to disallow certain re-opening modes. Co-developed-by: Christian Brauner Signed-off-by: Aleksa Sarai --- arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 3 ++ arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + fs/namei.c | 50 +++++++++++++++++++++ include/linux/fcntl.h | 8 +++- include/uapi/asm-generic/unistd.h | 5 ++- include/uapi/linux/fcntl.h | 10 +++++ 22 files changed, 91 insertions(+), 3 deletions(-) diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 63ed39cbd3bd..72f431b1dc9c 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -461,5 +461,6 @@ 530 common getegid sys_getegid 531 common geteuid sys_geteuid 532 common getppid sys_getppid +533 common resolveat sys_resolveat # all other architectures have common numbers for new syscall, alpha # is the exception. diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 9016f4081bb9..11f19c78185e 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -437,3 +437,4 @@ 421 common rt_sigtimedwait_time64 sys_rt_sigtimedwait 422 common futex_time64 sys_futex 423 common sched_rr_get_interval_time64 sys_sched_rr_get_interval +435 common resolveat sys_resolveat diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index d1dd93436e1e..d04eb26cfaeb 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -44,7 +44,7 @@ #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800) -#define __NR_compat_syscalls 424 +#define __NR_compat_syscalls 436 #endif #define __ARCH_WANT_SYS_CLONE diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 5590f2623690..9978234748a9 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -867,6 +867,9 @@ __SYSCALL(__NR_futex_time64, sys_futex) #define __NR_sched_rr_get_interval_time64 423 __SYSCALL(__NR_sched_rr_get_interval_time64, sys_sched_rr_get_interval) +#define __NR_resolveat 435 +__SYSCALL(__NR_resolveat, sys_sched_rr_get_interval) + /* * Please add new compat syscalls above this comment and update * __NR_compat_syscalls in asm/unistd.h. diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index ab9cda5f6136..26a3daab202e 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -344,3 +344,4 @@ 332 common pkey_free sys_pkey_free 333 common rseq sys_rseq # 334 through 423 are reserved to sync up with other architectures +435 common resolveat sys_resolveat diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index 125c14178979..3fdbd42077f1 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -423,3 +423,4 @@ 421 common rt_sigtimedwait_time64 sys_rt_sigtimedwait 422 common futex_time64 sys_futex 423 common sched_rr_get_interval_time64 sys_sched_rr_get_interval +435 common resolveat sys_resolveat diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 8ee3a8c18498..6948436a939b 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -429,3 +429,4 @@ 421 common rt_sigtimedwait_time64 sys_rt_sigtimedwait 422 common futex_time64 sys_futex 423 common sched_rr_get_interval_time64 sys_sched_rr_get_interval +435 common resolveat sys_resolveat diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index 15f4117900ee..36795694e495 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -362,3 +362,4 @@ 421 n32 rt_sigtimedwait_time64 compat_sys_rt_sigtimedwait_time64 422 n32 futex_time64 sys_futex 423 n32 sched_rr_get_interval_time64 sys_sched_rr_get_interval +435 n32 resolveat sys_resolveat diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index c85502e67b44..5fcde2d31fe7 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -338,3 +338,4 @@ 327 n64 rseq sys_rseq 328 n64 io_pgetevents sys_io_pgetevents # 329 through 423 are reserved to sync up with other architectures +435 n64 resolveat sys_resolveat diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 2e063d0f837e..71152b788e37 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -411,3 +411,4 @@ 421 o32 rt_sigtimedwait_time64 sys_rt_sigtimedwait compat_sys_rt_sigtimedwait_time64 422 o32 futex_time64 sys_futex sys_futex 423 o32 sched_rr_get_interval_time64 sys_sched_rr_get_interval sys_sched_rr_get_interval +435 o32 resolveat sys_resolveat sys_resolveat diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index b26766c6647d..68e415658235 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -420,3 +420,4 @@ 421 32 rt_sigtimedwait_time64 sys_rt_sigtimedwait compat_sys_rt_sigtimedwait_time64 422 32 futex_time64 sys_futex sys_futex 423 32 sched_rr_get_interval_time64 sys_sched_rr_get_interval sys_sched_rr_get_interval +435 common resolveat sys_resolveat sys_resolveat diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index b18abb0c3dae..69b0bf1df8b1 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -505,3 +505,4 @@ 421 32 rt_sigtimedwait_time64 sys_rt_sigtimedwait compat_sys_rt_sigtimedwait_time64 422 32 futex_time64 sys_futex sys_futex 423 32 sched_rr_get_interval_time64 sys_sched_rr_get_interval sys_sched_rr_get_interval +435 common resolveat sys_resolveat sys_resolveat diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 02579f95f391..9610f18eb2ea 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -426,3 +426,4 @@ 421 32 rt_sigtimedwait_time64 - compat_sys_rt_sigtimedwait_time64 422 32 futex_time64 - sys_futex 423 32 sched_rr_get_interval_time64 - sys_sched_rr_get_interval +435 common resolveat sys_resolveat - diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index bfda678576e4..ed53ca7717cf 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -426,3 +426,4 @@ 421 common rt_sigtimedwait_time64 sys_rt_sigtimedwait 422 common futex_time64 sys_futex 423 common sched_rr_get_interval_time64 sys_sched_rr_get_interval +435 common resolveat sys_resolveat diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index b9a5a04b2d2c..41778972430b 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -469,3 +469,4 @@ 421 32 rt_sigtimedwait_time64 sys_rt_sigtimedwait compat_sys_rt_sigtimedwait_time64 422 32 futex_time64 sys_futex sys_futex 423 32 sched_rr_get_interval_time64 sys_sched_rr_get_interval sys_sched_rr_get_interval +435 common resolveat sys_resolveat sys_resolveat diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 4cd5f982b1e5..f3cb515e52fb 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -438,3 +438,4 @@ 425 i386 io_uring_setup sys_io_uring_setup __ia32_sys_io_uring_setup 426 i386 io_uring_enter sys_io_uring_enter __ia32_sys_io_uring_enter 427 i386 io_uring_register sys_io_uring_register __ia32_sys_io_uring_register +435 i386 resolveat sys_resolveat __ia32_sys_resolveat diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 64ca0d06259a..5f4a988f3bc9 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -355,6 +355,7 @@ 425 common io_uring_setup __x64_sys_io_uring_setup 426 common io_uring_enter __x64_sys_io_uring_enter 427 common io_uring_register __x64_sys_io_uring_register +435 common resolveat __x64_sys_resolveat # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 6af49929de85..2d5b57ebd6c1 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -394,3 +394,4 @@ 421 common rt_sigtimedwait_time64 sys_rt_sigtimedwait 422 common futex_time64 sys_futex 423 common sched_rr_get_interval_time64 sys_sched_rr_get_interval +435 common resolveat sys_resolveat diff --git a/fs/namei.c b/fs/namei.c index 6c3bbe627965..4b4eb6c624cb 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3790,6 +3790,56 @@ struct file *do_filp_open(int dfd, struct filename *pathname, return filp; } +SYSCALL_DEFINE3(resolveat, int, dfd, const char __user *, path, + unsigned long, flags) +{ + int fd; + struct filename *tmp; + struct open_flags op = { + .open_flag = O_PATH | O_CLOEXEC, + .opath_mask = FMODE_PATH_READ | FMODE_PATH_WRITE, + }; + + if (flags & ~VALID_RESOLVE_FLAGS) + return -EINVAL; + + if (flags & RESOLVE_UPGRADE_NOREAD) + op.opath_mask &= ~FMODE_PATH_READ; + if (flags & RESOLVE_UPGRADE_NOWRITE) + op.opath_mask &= ~FMODE_PATH_WRITE; + + if (!(flags & RESOLVE_NO_FOLLOW)) + op.lookup_flags |= LOOKUP_FOLLOW; + if (flags & RESOLVE_BENEATH) + op.lookup_flags |= LOOKUP_BENEATH; + if (flags & RESOLVE_XDEV) + op.lookup_flags |= LOOKUP_XDEV; + if (flags & RESOLVE_NO_MAGICLINKS) + op.lookup_flags |= LOOKUP_NO_MAGICLINKS; + if (flags & RESOLVE_NO_SYMLINKS) + op.lookup_flags |= LOOKUP_NO_SYMLINKS; + if (flags & RESOLVE_IN_ROOT) + op.lookup_flags |= LOOKUP_IN_ROOT; + + tmp = getname(path); + if (IS_ERR(tmp)) + return PTR_ERR(tmp); + + fd = get_unused_fd_flags(op.open_flag); + if (fd >= 0) { + struct file *f = do_filp_open(dfd, tmp, &op); + if (IS_ERR(f)) { + put_unused_fd(fd); + fd = PTR_ERR(f); + } else { + fsnotify_open(f); + fd_install(fd, f); + } + } + putname(tmp); + return fd; +} + struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt, const char *name, const struct open_flags *op) { diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h index 2868ae6c8fc1..6356eb479765 100644 --- a/include/linux/fcntl.h +++ b/include/linux/fcntl.h @@ -4,13 +4,19 @@ #include -/* list of all valid flags for the open/openat flags argument: */ +/* List of all valid flags for the open/openat flags argument: */ #define VALID_OPEN_FLAGS \ (O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC | \ O_APPEND | O_NDELAY | O_NONBLOCK | O_NDELAY | __O_SYNC | O_DSYNC | \ FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \ O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE | O_EMPTYPATH) +/* List of all valid flags for resolveat(2). */ +#define VALID_RESOLVE_FLAGS \ + (RESOLVE_UPGRADE_NOWRITE | RESOLVE_UPGRADE_NOREAD | \ + RESOLVE_NO_FOLLOW | RESOLVE_BENEATH | RESOLVE_XDEV | \ + RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | RESOLVE_IN_ROOT) + #ifndef force_o_largefile #define force_o_largefile() (!IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T)) #endif diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index dee7292e1df6..f788c9f7d374 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -833,8 +833,11 @@ __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter) #define __NR_io_uring_register 427 __SYSCALL(__NR_io_uring_register, sys_io_uring_register) +#define __NR_resolveat 435 +__SYSCALL(__NR_resolveat, sys_resolveat) + #undef __NR_syscalls -#define __NR_syscalls 428 +#define __NR_syscalls 436 /* * 32 bit systems traditionally used different diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 1d338357df8a..185712fa9cc0 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -93,5 +93,15 @@ #define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ +/* Bottom bit is reserved for RESOLVE_UPGRADE_NOEXEC. */ +#define RESOLVE_UPGRADE_NOWRITE 0x002 /* Disallow re-opening for write. */ +#define RESOLVE_UPGRADE_NOREAD 0x004 /* Disallow re-opening for read. */ +#define RESOLVE_NO_FOLLOW 0x008 /* Don't follow trailing symlinks. */ + +#define RESOLVE_BENEATH 0x010 /* Block "lexical" trickery like "..", symlinks, absolute paths, etc. */ +#define RESOLVE_XDEV 0x020 /* Block mount-point crossings (includes bind-mounts). */ +#define RESOLVE_NO_MAGICLINKS 0x040 /* Block procfs-style "magic" symlinks. */ +#define RESOLVE_NO_SYMLINKS 0x080 /* Block all symlinks (implies AT_NO_MAGICLINKS). */ +#define RESOLVE_IN_ROOT 0x100 /* Scope ".." and "/" resolution to dirfd (like chroot(2)). */ #endif /* _UAPI_LINUX_FCNTL_H */ From patchwork Mon May 20 13:33:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 10951155 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8EEC71390 for ; Mon, 20 May 2019 13:35:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7E72728898 for ; Mon, 20 May 2019 13:35:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7291028893; Mon, 20 May 2019 13:35:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 04BBD28891 for ; Mon, 20 May 2019 13:35:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731517AbfETNfk (ORCPT ); Mon, 20 May 2019 09:35:40 -0400 Received: from mx1.mailbox.org ([80.241.60.212]:63538 "EHLO mx1.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725951AbfETNfk (ORCPT ); Mon, 20 May 2019 09:35:40 -0400 Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx1.mailbox.org (Postfix) with ESMTPS id 3B42E50122; Mon, 20 May 2019 15:35:37 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter04.heinlein-hosting.de (spamfilter04.heinlein-hosting.de [80.241.56.122]) (amavisd-new, port 10030) with ESMTP id SNqFqrdz-AGP; Mon, 20 May 2019 15:35:34 +0200 (CEST) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan Cc: Aleksa Sarai , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Christian Brauner , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Aleksa Sarai , Linus Torvalds , containers@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Subject: [PATCH RFC v8 09/10] kselftest: save-and-restore errno to allow for %m formatting Date: Mon, 20 May 2019 23:33:04 +1000 Message-Id: <20190520133305.11925-10-cyphar@cyphar.com> In-Reply-To: <20190520133305.11925-1-cyphar@cyphar.com> References: <20190520133305.11925-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Previously, using "%m" in a ksft_* format string can result in strange output because the errno value wasn't saved before calling other libc functions. The solution is to simply save and restore the errno before we format the user-supplied format string. Signed-off-by: Aleksa Sarai --- tools/testing/selftests/kselftest.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h index 47e1d995c182..ce049af7415b 100644 --- a/tools/testing/selftests/kselftest.h +++ b/tools/testing/selftests/kselftest.h @@ -10,6 +10,7 @@ #ifndef __KSELFTEST_H #define __KSELFTEST_H +#include #include #include #include @@ -72,58 +73,68 @@ static inline void ksft_print_cnts(void) static inline void ksft_print_msg(const char *msg, ...) { + int saved_errno = errno; va_list args; va_start(args, msg); printf("# "); + errno = saved_errno; vprintf(msg, args); va_end(args); } static inline void ksft_test_result_pass(const char *msg, ...) { + int saved_errno = errno; va_list args; ksft_cnt.ksft_pass++; va_start(args, msg); printf("ok %d ", ksft_test_num()); + errno = saved_errno; vprintf(msg, args); va_end(args); } static inline void ksft_test_result_fail(const char *msg, ...) { + int saved_errno = errno; va_list args; ksft_cnt.ksft_fail++; va_start(args, msg); printf("not ok %d ", ksft_test_num()); + errno = saved_errno; vprintf(msg, args); va_end(args); } static inline void ksft_test_result_skip(const char *msg, ...) { + int saved_errno = errno; va_list args; ksft_cnt.ksft_xskip++; va_start(args, msg); printf("ok %d # skip ", ksft_test_num()); + errno = saved_errno; vprintf(msg, args); va_end(args); } static inline void ksft_test_result_error(const char *msg, ...) { + int saved_errno = errno; va_list args; ksft_cnt.ksft_error++; va_start(args, msg); printf("not ok %d # error ", ksft_test_num()); + errno = saved_errno; vprintf(msg, args); va_end(args); } @@ -143,10 +154,12 @@ static inline int ksft_exit_fail(void) static inline int ksft_exit_fail_msg(const char *msg, ...) { + int saved_errno = errno; va_list args; va_start(args, msg); printf("Bail out! "); + errno = saved_errno; vprintf(msg, args); va_end(args); @@ -169,10 +182,12 @@ static inline int ksft_exit_xpass(void) static inline int ksft_exit_skip(const char *msg, ...) { if (msg) { + int saved_errno = errno; va_list args; va_start(args, msg); printf("1..%d # Skipped: ", ksft_test_num()); + errno = saved_errno; vprintf(msg, args); va_end(args); } else { From patchwork Mon May 20 13:33:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 10951157 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 756F1912 for ; Mon, 20 May 2019 13:36:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 614102874B for ; Mon, 20 May 2019 13:36:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5493E287EA; Mon, 20 May 2019 13:36:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75AAB2888B for ; Mon, 20 May 2019 13:36:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730221AbfETNf5 (ORCPT ); Mon, 20 May 2019 09:35:57 -0400 Received: from mx2.mailbox.org ([80.241.60.215]:10828 "EHLO mx2.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725951AbfETNf4 (ORCPT ); Mon, 20 May 2019 09:35:56 -0400 Received: from smtp2.mailbox.org (smtp2.mailbox.org [80.241.60.241]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx2.mailbox.org (Postfix) with ESMTPS id C59DBA1156; Mon, 20 May 2019 15:35:51 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter04.heinlein-hosting.de (spamfilter04.heinlein-hosting.de [80.241.56.122]) (amavisd-new, port 10030) with ESMTP id MekZNDxt4WEW; Mon, 20 May 2019 15:35:47 +0200 (CEST) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan Cc: Aleksa Sarai , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Christian Brauner , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Aleksa Sarai , Linus Torvalds , containers@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Subject: [PATCH RFC v8 10/10] selftests: add resolveat(2) selftests Date: Mon, 20 May 2019 23:33:05 +1000 Message-Id: <20190520133305.11925-11-cyphar@cyphar.com> In-Reply-To: <20190520133305.11925-1-cyphar@cyphar.com> References: <20190520133305.11925-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Test all of the various resolveat(2) flags, as well as how file descriptor re-opening works. A small stress-test of a symlink-rename attack is included to show that the protections against ".."-based attacks are sufficient. Signed-off-by: Aleksa Sarai --- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/resolveat/.gitignore | 1 + tools/testing/selftests/resolveat/Makefile | 6 + tools/testing/selftests/resolveat/helpers.h | 195 +++++++++ .../selftests/resolveat/linkmode_test.c | 306 ++++++++++++++ .../selftests/resolveat/resolveat_test.c | 400 ++++++++++++++++++ 6 files changed, 909 insertions(+) create mode 100644 tools/testing/selftests/resolveat/.gitignore create mode 100644 tools/testing/selftests/resolveat/Makefile create mode 100644 tools/testing/selftests/resolveat/helpers.h create mode 100644 tools/testing/selftests/resolveat/linkmode_test.c create mode 100644 tools/testing/selftests/resolveat/resolveat_test.c diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 971fc8428117..f558d6f21c4b 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -37,6 +37,7 @@ TARGETS += powerpc TARGETS += proc TARGETS += pstore TARGETS += ptrace +TARGETS += resolveat TARGETS += rseq TARGETS += rtc TARGETS += seccomp diff --git a/tools/testing/selftests/resolveat/.gitignore b/tools/testing/selftests/resolveat/.gitignore new file mode 100644 index 000000000000..bd68f6c3fd07 --- /dev/null +++ b/tools/testing/selftests/resolveat/.gitignore @@ -0,0 +1 @@ +/*_test diff --git a/tools/testing/selftests/resolveat/Makefile b/tools/testing/selftests/resolveat/Makefile new file mode 100644 index 000000000000..375eaf4a55e7 --- /dev/null +++ b/tools/testing/selftests/resolveat/Makefile @@ -0,0 +1,6 @@ +CFLAGS += -g -I../../../../usr/include/ + +TEST_GEN_PROGS := linkmode_test resolveat_test + +include ../lib.mk + diff --git a/tools/testing/selftests/resolveat/helpers.h b/tools/testing/selftests/resolveat/helpers.h new file mode 100644 index 000000000000..c765f606cdbc --- /dev/null +++ b/tools/testing/selftests/resolveat/helpers.h @@ -0,0 +1,195 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Author: Aleksa Sarai + * Copyright (C) 2018-2019 SUSE LLC. + */ + +#ifndef __RESOLVEAT_H__ +#define __RESOLVEAT_H__ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest.h" + +#define ARRAY_LEN(X) (sizeof (X) / sizeof (*(X))) + +#ifndef __NR_resolveat +#define __NR_resolveat 435 +#define RESOLVE_UPGRADE_NOWRITE 0x002 /* Disallow re-opening for write. */ +#define RESOLVE_UPGRADE_NOREAD 0x004 /* Disallow re-opening for read. */ +#define RESOLVE_NO_FOLLOW 0x008 /* Don't follow trailing symlinks. */ +#define RESOLVE_BENEATH 0x010 /* Block "lexical" trickery like "..", symlinks, absolute paths, etc. */ +#define RESOLVE_XDEV 0x020 /* Block mount-point crossings (includes bind-mounts). */ +#define RESOLVE_NO_MAGICLINKS 0x040 /* Block procfs-style "magic" symlinks. */ +#define RESOLVE_NO_SYMLINKS 0x080 /* Block all symlinks (implies AT_NO_MAGICLINKS). */ +#define RESOLVE_IN_ROOT 0x100 /* Scope ".." and "/" resolution to dirfd (like chroot(2)). */ +#endif /* __NR_resolveat */ + +#ifndef O_EMPTYPATH +#define O_EMPTYPATH 040000000 +#endif /* O_EMPTYPATH */ + +#define E_func(func, ...) \ + do { \ + if (func(__VA_ARGS__) < 0) \ + ksft_exit_fail_msg("%s:%d %s failed\n", \ + __FILE__, __LINE__, #func);\ + } while (0) + +#define E_mkdirat(...) E_func(mkdirat, __VA_ARGS__) +#define E_symlinkat(...) E_func(symlinkat, __VA_ARGS__) +#define E_touchat(...) E_func(touchat, __VA_ARGS__) +#define E_readlink(...) E_func(readlink, __VA_ARGS__) +#define E_fstatat(...) E_func(fstatat, __VA_ARGS__) +#define E_asprintf(...) E_func(asprintf, __VA_ARGS__) +#define E_fchdir(...) E_func(fchdir, __VA_ARGS__) +#define E_mount(...) E_func(mount, __VA_ARGS__) +#define E_unshare(...) E_func(unshare, __VA_ARGS__) +#define E_setresuid(...) E_func(setresuid, __VA_ARGS__) +#define E_chmod(...) E_func(chmod, __VA_ARGS__) + +#define E_assert(expr, msg, ...) \ + do { \ + if (!(expr)) \ + ksft_exit_fail_msg("ASSERT(%s:%d) failed (%s): " msg "\n", \ + __FILE__, __LINE__, #expr, ##__VA_ARGS__); \ + } while (0) + +typedef int (*openfunc_t)(int dfd, const char *path, unsigned int flags); + +static int sys_resolveat(int dfd, const char *path, unsigned int flags) +{ + int ret = syscall(__NR_resolveat, dfd, path, flags); + return ret >= 0 ? ret : -errno; +} + +static int sys_openat(int dfd, const char *path, unsigned int flags) +{ + int ret = openat(dfd, path, flags); + return ret >= 0 ? ret : -errno; +} + +static int sys_execveat(int dfd, const char *path, + char *const argv[], char *const envp[], int flags) +{ + int ret = syscall(SYS_execveat, dfd, path, argv, envp, flags); + return ret >= 0 ? ret : -errno; +} + +static char *resolveat_flags(unsigned int flags) +{ + char *flagset, *p; + + E_asprintf(&flagset, "%s%s%s%s%s%s%s%s0", + (flags & RESOLVE_UPGRADE_NOWRITE) ? "RESOLVE_UPGRADE_NOWRITE|" : "", + (flags & RESOLVE_UPGRADE_NOREAD) ? "RESOLVE_UPGRADE_NOREAD|" : "", + (flags & RESOLVE_NO_FOLLOW) ? "RESOLVE_NO_FOLLOW|" : "", + (flags & RESOLVE_BENEATH) ? "RESOLVE_BENEATH|" : "", + (flags & RESOLVE_XDEV) ? "RESOLVE_XDEV|" : "", + (flags & RESOLVE_NO_MAGICLINKS) ? "RESOLVE_NO_MAGICLINKS|" : "", + (flags & RESOLVE_NO_SYMLINKS) ? "RESOLVE_NO_SYMLINKS|" : "", + (flags & RESOLVE_IN_ROOT) ? "RESOLVE_IN_ROOT|" : ""); + + /* Fix up the trailing |0. */ + p = strstr(flagset, "|0"); + if (p) + *p = '\0'; + return flagset; +} + +static char *openat_flags(unsigned int flags) +{ + char *flagset; + const char *modeflag = "(none)"; + + /* Handle the peculiarity of the ACC_MODE flags. */ + switch (flags & 0x03) { + case O_RDWR: + modeflag = "O_RDWR"; + break; + case O_RDONLY: + modeflag = "O_RDONLY"; + break; + case O_WRONLY: + modeflag = "O_WRONLY"; + break; + } + + /* TODO: Add more open flags. */ + E_asprintf(&flagset, "%s", modeflag); + return flagset; +} + +static int touchat(int dfd, const char *path) +{ + int fd = openat(dfd, path, O_CREAT); + if (fd >= 0) + close(fd); + return fd; +} + +static char *fdreadlink(int fd) +{ + char *target, *tmp; + + E_asprintf(&tmp, "/proc/self/fd/%d", fd); + + target = malloc(PATH_MAX); + if (!target) + ksft_exit_fail_msg("fdreadlink: malloc failed\n"); + memset(target, 0, PATH_MAX); + + E_readlink(tmp, target, PATH_MAX); + free(tmp); + return target; +} + +static bool fdequal(int fd, int dfd, const char *path) +{ + char *fdpath, *dfdpath, *other; + bool cmp; + + fdpath = fdreadlink(fd); + dfdpath = fdreadlink(dfd); + + if (!path) + E_asprintf(&other, "%s", dfdpath); + else if (*path == '/') + E_asprintf(&other, "%s", path); + else + E_asprintf(&other, "%s/%s", dfdpath, path); + + cmp = !strcmp(fdpath, other); + if (!cmp) + ksft_print_msg("fdequal: expected '%s' but got '%s'\n", other, fdpath); + + free(fdpath); + free(dfdpath); + free(other); + return cmp; +} + +static void test_resolveat_supported(void) +{ + int fd = sys_resolveat(AT_FDCWD, ".", 0); + if (fd == -ENOSYS) + ksft_exit_skip("resolveat(2) unsupported on this kernel\n"); + if (fd < 0) + ksft_exit_fail_msg("resolveat(2) supported check failed: %s\n", strerror(-fd)); + close(fd); +} + +#endif /* __RESOLVEAT_H__ */ diff --git a/tools/testing/selftests/resolveat/linkmode_test.c b/tools/testing/selftests/resolveat/linkmode_test.c new file mode 100644 index 000000000000..b60375099494 --- /dev/null +++ b/tools/testing/selftests/resolveat/linkmode_test.c @@ -0,0 +1,306 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Author: Aleksa Sarai + * Copyright (C) 2018-2019 SUSE LLC. + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest.h" +#include "helpers.h" + +static mode_t fdmode(int fd) +{ + char *fdpath; + struct stat statbuf; + mode_t mode; + + E_asprintf(&fdpath, "/proc/self/fd/%d", fd); + E_fstatat(AT_FDCWD, fdpath, &statbuf, AT_SYMLINK_NOFOLLOW); + mode = (statbuf.st_mode & ~S_IFMT); + free(fdpath); + + return mode; +} + +static int reopen_proc(int fd, unsigned int flags) +{ + int ret, saved_errno; + char *fdpath; + + E_asprintf(&fdpath, "/proc/self/fd/%d", fd); + ret = open(fdpath, flags); + saved_errno = errno; + free(fdpath); + + return ret >= 0 ? ret : -saved_errno; +} + +static int reopen_oemptypath(int fd, unsigned int flags) +{ + int ret = openat(fd, "", O_EMPTYPATH | flags); + return ret >= 0 ? ret : -errno; +} + +struct reopen_test { + openfunc_t open; + mode_t chmod_mode; + struct { + unsigned int flags; + mode_t mode; + int err; + } orig, new; +}; + +static bool reopen(int fd, struct reopen_test *test) +{ + int newfd; + mode_t proc_mode; + bool failed = false; + + /* Check that the proc mode is correct. */ + proc_mode = fdmode(fd); + if (proc_mode != test->orig.mode) { + ksft_print_msg("incorrect fdmode (got[%o] != want[%o])\n", + proc_mode, test->orig.mode); + failed = true; + } + + /* Re-open through /proc. */ + newfd = reopen_proc(fd, test->new.flags); + if (newfd != test->new.err && (newfd < 0 || test->new.err < 0)) { + ksft_print_msg("/proc failure (%d != %d [%s])\n", + newfd, test->new.err, strerror(-test->new.err)); + failed = true; + } + if (newfd >= 0) { + proc_mode = fdmode(newfd); + if (proc_mode != test->new.mode) { + ksft_print_msg("/proc wrong fdmode (got[%o] != want[%o])\n", + proc_mode, test->new.mode); + failed = true; + } + close(newfd); + } + + /* Re-open with O_EMPTYPATH. */ + newfd = reopen_oemptypath(fd, test->new.flags); + if (newfd != test->new.err && (newfd < 0 || test->new.err < 0)) { + ksft_print_msg("O_EMPTYPATH failure (%d != %d [%s])\n", + newfd, test->new.err, strerror(-test->new.err)); + failed = true; + } + if (newfd >= 0) { + proc_mode = fdmode(newfd); + if (proc_mode != test->new.mode) { + ksft_print_msg("O_EMPTYPATH wrong fdmode (got[%o] != want[%o])\n", + proc_mode, test->new.mode); + failed = true; + } + close(newfd); + } + + return failed; +} + +void test_reopen_ordinary(bool privileged) +{ + int fd; + int err_access = privileged ? 0 : -EACCES; + char tmpfile[] = "/tmp/reopen-testfile.XXXXXX"; + + fd = mkstemp(tmpfile); + E_assert(fd >= 0, "mkstemp failed: %m\n"); + close(fd); + + struct reopen_test tests[] = { + /* Re-opening with the same mode should succeed. */ + { .open = sys_openat, .chmod_mode = 0400, + .orig.flags = O_RDONLY, .orig.mode = privileged ? 0777 : 0555, + .new.flags = O_RDONLY, .new.mode = privileged ? 0777 : 0555}, + { .open = sys_openat, .chmod_mode = 0200, + .orig.flags = O_WRONLY, .orig.mode = privileged ? 0777 : 0333, + .new.flags = O_WRONLY, .new.mode = privileged ? 0777 : 0333 }, + { .open = sys_openat, .chmod_mode = 0600, + .orig.flags = O_RDWR, .orig.mode = 0777, + .new.flags = O_RDWR, .new.mode = 0777 }, + { .open = sys_openat, .chmod_mode = 0600, + .orig.flags = O_RDWR, .orig.mode = 0777, + .new.flags = O_RDONLY, .new.mode = 0777 }, + { .open = sys_openat, .chmod_mode = 0600, + .orig.flags = O_RDWR, .orig.mode = 0777, + .new.flags = O_WRONLY, .new.mode = 0777 }, + + /* + * Upgrading the mode of a normal file works if the user had the + * required access at original-open time. + */ + { .open = sys_openat, .chmod_mode = 0600, + .orig.flags = O_RDONLY, .orig.mode = 0777, + .new.flags = O_WRONLY, .new.mode = 0777 }, + { .open = sys_openat, .chmod_mode = 0600, + .orig.flags = O_WRONLY, .orig.mode = 0777, + .new.flags = O_RDONLY, .new.mode = 0777 }, + { .open = sys_openat, .chmod_mode = 0600, + .orig.flags = O_RDONLY, .orig.mode = 0777, + .new.flags = O_RDWR, .new.mode = 0777 }, + { .open = sys_openat, .chmod_mode = 0600, + .orig.flags = O_WRONLY, .orig.mode = 0777, + .new.flags = O_RDWR, .new.mode = 0777 }, + + /* However, re-open will be blocked given insufficient permissions. */ + { .open = sys_openat, .chmod_mode = 0400, + .orig.flags = O_RDONLY, .orig.mode = privileged ? 0777 : 0555, + .new.flags = O_WRONLY, .new.mode = 0777, .new.err = err_access }, + { .open = sys_openat, .chmod_mode = 0200, + .orig.flags = O_WRONLY, .orig.mode = privileged ? 0777 : 0333, + .new.flags = O_RDONLY, .new.mode = 0777, .new.err = err_access }, + { .open = sys_openat, .chmod_mode = 0400, + .orig.flags = O_RDONLY, .orig.mode = privileged ? 0777 : 0555, + .new.flags = O_RDWR, .new.mode = 0777, .new.err = err_access }, + { .open = sys_openat, .chmod_mode = 0200, + .orig.flags = O_WRONLY, .orig.mode = privileged ? 0777 : 0333, + .new.flags = O_RDWR, .new.mode = 0777, .new.err = err_access }, + + /* O_PATH re-opens (of ordinary files) will always work. */ + { .open = sys_openat, .chmod_mode = 0000, + .orig.flags = O_PATH, .orig.mode = 0777, + .new.flags = O_WRONLY, .new.mode = 0777 }, + { .open = sys_resolveat, .chmod_mode = 0000, + .orig.flags = 0, .orig.mode = 0777, + .new.flags = O_WRONLY, .new.mode = 0777 }, + { .open = sys_openat, .chmod_mode = 0000, + .orig.flags = O_PATH, .orig.mode = 0777, + .new.flags = O_RDONLY, .new.mode = 0777 }, + { .open = sys_resolveat, .chmod_mode = 0000, + .orig.flags = 0, .orig.mode = 0777, + .new.flags = O_RDONLY, .new.mode = 0777 }, + { .open = sys_openat, .chmod_mode = 0000, + .orig.flags = O_PATH, .orig.mode = 0777, + .new.flags = O_RDWR, .new.mode = 0777 }, + { .open = sys_resolveat, .chmod_mode = 0000, + .orig.flags = 0, .orig.mode = 0777, + .new.flags = O_RDWR, .new.mode = 0777 }, + + /* + * resolveat(2) RESOLVE_UPGRADE_NO* flags. In the privileged case, the + * re-open will work but the mode will still be scoped to the mode + * (or'd with the open acc_mode). + */ + { .open = sys_resolveat, .chmod_mode = 0000, + .orig.flags = RESOLVE_UPGRADE_NOREAD | RESOLVE_UPGRADE_NOWRITE, + .orig.mode = 0111, + .new.flags = O_RDONLY, .new.mode = 0555, .new.err = err_access }, + { .open = sys_resolveat, .chmod_mode = 0000, + .orig.flags = RESOLVE_UPGRADE_NOREAD | RESOLVE_UPGRADE_NOWRITE, + .orig.mode = 0111, + .new.flags = O_WRONLY, .new.mode = 0333, .new.err = err_access }, + { .open = sys_resolveat, .chmod_mode = 0000, + .orig.flags = RESOLVE_UPGRADE_NOREAD | RESOLVE_UPGRADE_NOWRITE, + .orig.mode = 0111, + .new.flags = O_RDWR, .new.mode = 0777, .new.err = err_access }, + { .open = sys_resolveat, .chmod_mode = 0000, + .orig.flags = RESOLVE_UPGRADE_NOWRITE, + .orig.mode = 0555, + .new.flags = O_RDONLY, .new.mode = 0555 }, + { .open = sys_resolveat, .chmod_mode = 0000, + .orig.flags = RESOLVE_UPGRADE_NOREAD, + .orig.mode = 0333, + .new.flags = O_WRONLY, .new.mode = 0333 }, + { .open = sys_resolveat, .chmod_mode = 0000, + .orig.flags = RESOLVE_UPGRADE_NOWRITE, + .orig.mode = 0555, + .new.flags = O_RDONLY, .new.mode = 0555 }, + { .open = sys_resolveat, .chmod_mode = 0000, + .orig.flags = RESOLVE_UPGRADE_NOREAD, + .orig.mode = 0333, + .new.flags = O_RDONLY, .new.mode = 0777, .new.err = err_access }, + { .open = sys_resolveat, .chmod_mode = 0000, + .orig.flags = RESOLVE_UPGRADE_NOWRITE, + .orig.mode = 0555, + .new.flags = O_WRONLY, .new.mode = 0777, .new.err = err_access }, + }; + + for (int i = 0; i < ARRAY_LEN(tests); i++) { + int fd; + char *orig_flagset, *new_flagset; + struct reopen_test *test = &tests[i]; + void (*resultfn)(const char *msg, ...) = ksft_test_result_pass; + + E_chmod(tmpfile, test->chmod_mode); + + fd = test->open(AT_FDCWD, tmpfile, test->orig.flags); + E_assert(fd >= 0, "open '%s' failed: %m\n", tmpfile); + + /* Make sure that any EACCES we see is not from inode permissions. */ + E_chmod(tmpfile, 0777); + + if (reopen(fd, test)) + resultfn = ksft_test_result_fail; + + close(fd); + + new_flagset = openat_flags(test->new.flags); + if (test->open == sys_openat) + orig_flagset = openat_flags(test->orig.flags); + else if (test->open == sys_resolveat) + orig_flagset = resolveat_flags(test->orig.flags); + else + ksft_exit_fail_msg("unknown test->open\n"); + + resultfn("%sordinary reopen of (orig[%s]=%s, new=%s) chmod=%.3o %s\n", + privileged ? "privileged " : "", + test->open == sys_openat ? "openat" : "resolveat", + orig_flagset, new_flagset, test->chmod_mode, + test->new.err < 0 ? strerror(-test->new.err) : "works"); + + free(new_flagset); + free(orig_flagset); + } + + unlink(tmpfile); +} + + +int main(int argc, char **argv) +{ + bool privileged; + + ksft_print_header(); + test_resolveat_supported(); + + /* + * Technically we should be checking CAP_DAC_OVERRIDE, but it's easier to + * just assume that euid=0 has the full capability set. + */ + privileged = (geteuid() == 0); + if (!privileged) + ksft_test_result_skip("privileged tests require euid == 0\n"); + else { + test_reopen_ordinary(privileged); + + E_setresuid(65534, 65534, 65534); + privileged = (geteuid() == 0); + } + + test_reopen_ordinary(privileged); + + if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0) + ksft_exit_fail(); + else + ksft_exit_pass(); +} diff --git a/tools/testing/selftests/resolveat/resolveat_test.c b/tools/testing/selftests/resolveat/resolveat_test.c new file mode 100644 index 000000000000..72f2e8c5dfe0 --- /dev/null +++ b/tools/testing/selftests/resolveat/resolveat_test.c @@ -0,0 +1,400 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Author: Aleksa Sarai + * Copyright (C) 2018-2019 SUSE LLC. + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest.h" +#include "helpers.h" + +/* + * Construct a test directory with the following structure: + * + * root/ + * |-- procexe -> /proc/self/exe + * |-- procroot -> /proc/self/root + * |-- root/ + * |-- mnt/ [mountpoint] + * | |-- self -> ../mnt/ + * | `-- absself -> /mnt/ + * |-- etc/ + * | `-- passwd + * |-- relsym -> etc/passwd + * |-- abssym -> /etc/passwd + * |-- abscheeky -> /cheeky + * |-- abscheeky -> /cheeky + * `-- cheeky/ + * |-- absself -> / + * |-- self -> ../../root/ + * |-- garbageself -> /../../root/ + * |-- passwd -> ../cheeky/../cheeky/../etc/../etc/passwd + * |-- abspasswd -> /../cheeky/../cheeky/../etc/../etc/passwd + * |-- dotdotlink -> ../../../../../../../../../../../../../../etc/passwd + * `-- garbagelink -> /../../../../../../../../../../../../../../etc/passwd + */ +int setup_testdir(void) +{ + int dfd, tmpfd; + char dirname[] = "/tmp/resolveat-testdir.XXXXXX"; + + /* Unshare and make /tmp a new directory. */ + E_unshare(CLONE_NEWNS); + E_mount("", "/tmp", "", MS_PRIVATE, ""); + + /* Make the top-level directory. */ + if (!mkdtemp(dirname)) + ksft_exit_fail_msg("setup_testdir: failed to create tmpdir\n"); + dfd = open(dirname, O_PATH | O_DIRECTORY); + if (dfd < 0) + ksft_exit_fail_msg("setup_testdir: failed to open tmpdir\n"); + + /* A sub-directory which is actually used for tests. */ + E_mkdirat(dfd, "root", 0755); + tmpfd = openat(dfd, "root", O_PATH | O_DIRECTORY); + if (tmpfd < 0) + ksft_exit_fail_msg("setup_testdir: failed to open tmpdir\n"); + close(dfd); + dfd = tmpfd; + + E_symlinkat("/proc/self/exe", dfd, "procexe"); + E_symlinkat("/proc/self/root", dfd, "procroot"); + E_mkdirat(dfd, "root", 0755); + + /* There is no mountat(2), so use chdir. */ + E_mkdirat(dfd, "mnt", 0755); + E_fchdir(dfd); + E_mount("tmpfs", "./mnt", "tmpfs", MS_NOSUID | MS_NODEV, ""); + E_symlinkat("../mnt/", dfd, "mnt/self"); + E_symlinkat("/mnt/", dfd, "mnt/absself"); + + E_mkdirat(dfd, "etc", 0755); + E_touchat(dfd, "etc/passwd"); + + E_symlinkat("etc/passwd", dfd, "relsym"); + E_symlinkat("/etc/passwd", dfd, "abssym"); + E_symlinkat("/cheeky", dfd, "abscheeky"); + + E_mkdirat(dfd, "cheeky", 0755); + + E_symlinkat("/", dfd, "cheeky/absself"); + E_symlinkat("../../root/", dfd, "cheeky/self"); + E_symlinkat("/../../root/", dfd, "cheeky/garbageself"); + + E_symlinkat("../cheeky/../etc/../etc/passwd", dfd, "cheeky/passwd"); + E_symlinkat("/../cheeky/../etc/../etc/passwd", dfd, "cheeky/abspasswd"); + + E_symlinkat("../../../../../../../../../../../../../../etc/passwd", + dfd, "cheeky/dotdotlink"); + E_symlinkat("/../../../../../../../../../../../../../../etc/passwd", + dfd, "cheeky/garbagelink"); + + return dfd; +} + +struct basic_test { + const char *dir; + const char *path; + unsigned int flags; + bool pass; + union { + int err; + const char *path; + } out; +}; + +void test_resolveat_basic_tests(void) +{ + int rootfd; + char *procselfexe; + + E_asprintf(&procselfexe, "/proc/%d/exe", getpid()); + rootfd = setup_testdir(); + + struct basic_test tests[] = { + /** RESOLVE_BENEATH **/ + /* Attempts to cross dirfd should be blocked. */ + { .path = "/", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .path = "cheeky/absself", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .path = "abscheeky/absself", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .path = "..", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .path = "../root/", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .path = "cheeky/self", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .path = "abscheeky/self", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .path = "cheeky/garbageself", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .path = "abscheeky/garbageself", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + /* Only relative paths that stay inside dirfd should work. */ + { .path = "root", .flags = RESOLVE_BENEATH, + .out.path = "root", .pass = true }, + { .path = "etc", .flags = RESOLVE_BENEATH, + .out.path = "etc", .pass = true }, + { .path = "etc/passwd", .flags = RESOLVE_BENEATH, + .out.path = "etc/passwd", .pass = true }, + { .path = "relsym", .flags = RESOLVE_BENEATH, + .out.path = "etc/passwd", .pass = true }, + { .path = "cheeky/passwd", .flags = RESOLVE_BENEATH, + .out.path = "etc/passwd", .pass = true }, + { .path = "abscheeky/passwd", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .path = "abssym", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .path = "/etc/passwd", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .path = "cheeky/abspasswd", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .path = "abscheeky/abspasswd", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + /* Tricky paths should fail. */ + { .path = "cheeky/dotdotlink", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .path = "abscheeky/dotdotlink", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .path = "cheeky/garbagelink", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .path = "abscheeky/garbagelink", .flags = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + + /** RESOLVE_IN_ROOT **/ + /* All attempts to cross the dirfd will be scoped-to-root. */ + { .path = "/", .flags = RESOLVE_IN_ROOT, + .out.path = NULL, .pass = true }, + { .path = "cheeky/absself", .flags = RESOLVE_IN_ROOT, + .out.path = NULL, .pass = true }, + { .path = "abscheeky/absself", .flags = RESOLVE_IN_ROOT, + .out.path = NULL, .pass = true }, + { .path = "..", .flags = RESOLVE_IN_ROOT, + .out.path = NULL, .pass = true }, + { .path = "../root/", .flags = RESOLVE_IN_ROOT, + .out.path = "root", .pass = true }, + { .path = "../root/", .flags = RESOLVE_IN_ROOT, + .out.path = "root", .pass = true }, + { .path = "cheeky/self", .flags = RESOLVE_IN_ROOT, + .out.path = "root", .pass = true }, + { .path = "cheeky/garbageself", .flags = RESOLVE_IN_ROOT, + .out.path = "root", .pass = true }, + { .path = "abscheeky/garbageself", .flags = RESOLVE_IN_ROOT, + .out.path = "root", .pass = true }, + { .path = "root", .flags = RESOLVE_IN_ROOT, + .out.path = "root", .pass = true }, + { .path = "etc", .flags = RESOLVE_IN_ROOT, + .out.path = "etc", .pass = true }, + { .path = "etc/passwd", .flags = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .path = "relsym", .flags = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .path = "cheeky/passwd", .flags = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .path = "abscheeky/passwd", .flags = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .path = "abssym", .flags = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .path = "/etc/passwd", .flags = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .path = "cheeky/abspasswd", .flags = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .path = "abscheeky/abspasswd",.flags = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .path = "cheeky/dotdotlink", .flags = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .path = "abscheeky/dotdotlink", .flags = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .path = "/../../../../abscheeky/dotdotlink", .flags = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .path = "cheeky/garbagelink", .flags = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .path = "abscheeky/garbagelink", .flags = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .path = "/../../../../abscheeky/garbagelink", .flags = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + + /** RESOLVE_XDEV **/ + /* Crossing *down* into a mountpoint is disallowed. */ + { .path = "mnt", .flags = RESOLVE_XDEV, + .out.err = -EXDEV, .pass = false }, + { .path = "mnt/", .flags = RESOLVE_XDEV, + .out.err = -EXDEV, .pass = false }, + { .path = "mnt/.", .flags = RESOLVE_XDEV, + .out.err = -EXDEV, .pass = false }, + /* Crossing *up* out of a mountpoint is disallowed. */ + { .dir = "mnt", .path = ".", .flags = RESOLVE_XDEV, + .out.path = "mnt", .pass = true }, + { .dir = "mnt", .path = "..", .flags = RESOLVE_XDEV, + .out.err = -EXDEV, .pass = false }, + { .dir = "mnt", .path = "../mnt", .flags = RESOLVE_XDEV, + .out.err = -EXDEV, .pass = false }, + { .dir = "mnt", .path = "self", .flags = RESOLVE_XDEV, + .out.err = -EXDEV, .pass = false }, + { .dir = "mnt", .path = "absself", .flags = RESOLVE_XDEV, + .out.err = -EXDEV, .pass = false }, + /* Jumping to "/" is ok, but later components cannot cross. */ + { .dir = "mnt", .path = "/", .flags = RESOLVE_XDEV, + .out.path = "/", .pass = true }, + { .dir = "/", .path = "/", .flags = RESOLVE_XDEV, + .out.path = "/", .pass = true }, + { .path = "/proc/1", .flags = RESOLVE_XDEV, + .out.err = -EXDEV, .pass = false }, + { .path = "/tmp", .flags = RESOLVE_XDEV, + .out.err = -EXDEV, .pass = false }, + + /** RESOLVE_NO_MAGICLINKS **/ + /* Regular symlinks should work. */ + { .path = "relsym", .flags = RESOLVE_NO_MAGICLINKS, + .out.path = "etc/passwd", .pass = true }, + /* Magic-links should not work. */ + { .path = "procexe", .flags = RESOLVE_NO_MAGICLINKS, + .out.err = -ELOOP, .pass = false }, + { .path = "/proc/self/exe", .flags = RESOLVE_NO_MAGICLINKS, + .out.err = -ELOOP, .pass = false }, + { .path = "procroot/etc", .flags = RESOLVE_NO_MAGICLINKS, + .out.err = -ELOOP, .pass = false }, + { .path = "/proc/self/root/etc", .flags = RESOLVE_NO_MAGICLINKS, + .out.err = -ELOOP, .pass = false }, + { .path = "/proc/self/root/etc", .flags = RESOLVE_NO_MAGICLINKS | RESOLVE_NO_FOLLOW, + .out.err = -ELOOP, .pass = false }, + { .path = "/proc/self/exe", .flags = RESOLVE_NO_MAGICLINKS | RESOLVE_NO_FOLLOW, + .out.path = procselfexe, .pass = true }, + + /** RESOLVE_NO_SYMLINKS **/ + /* Normal paths should work. */ + { .path = ".", .flags = RESOLVE_NO_SYMLINKS, + .out.path = NULL, .pass = true }, + { .path = "root", .flags = RESOLVE_NO_SYMLINKS, + .out.path = "root", .pass = true }, + { .path = "etc", .flags = RESOLVE_NO_SYMLINKS, + .out.path = "etc", .pass = true }, + { .path = "etc/passwd", .flags = RESOLVE_NO_SYMLINKS, + .out.path = "etc/passwd", .pass = true }, + /* Regular symlinks are blocked. */ + { .path = "relsym", .flags = RESOLVE_NO_SYMLINKS, + .out.err = -ELOOP, .pass = false }, + { .path = "abssym", .flags = RESOLVE_NO_SYMLINKS, + .out.err = -ELOOP, .pass = false }, + { .path = "cheeky/garbagelink", .flags = RESOLVE_NO_SYMLINKS, + .out.err = -ELOOP, .pass = false }, + { .path = "abscheeky/garbagelink", .flags = RESOLVE_NO_SYMLINKS, + .out.err = -ELOOP, .pass = false }, + { .path = "abscheeky/absself", .flags = RESOLVE_NO_SYMLINKS, + .out.err = -ELOOP, .pass = false }, + /* Trailing symlinks with NO_FOLLOW. */ + { .path = "relsym", .flags = RESOLVE_NO_SYMLINKS | RESOLVE_NO_FOLLOW, + .out.path = "relsym", .pass = true }, + { .path = "abssym", .flags = RESOLVE_NO_SYMLINKS | RESOLVE_NO_FOLLOW, + .out.path = "abssym", .pass = true }, + { .path = "cheeky/garbagelink", .flags = RESOLVE_NO_SYMLINKS | RESOLVE_NO_FOLLOW, + .out.path = "cheeky/garbagelink", .pass = true }, + { .path = "abscheeky/garbagelink", .flags = RESOLVE_NO_SYMLINKS | RESOLVE_NO_FOLLOW, + .out.err = -ELOOP, .pass = false }, + { .path = "abscheeky/absself", .flags = RESOLVE_NO_SYMLINKS | RESOLVE_NO_FOLLOW, + .out.err = -ELOOP, .pass = false }, + }; + + for (int i = 0; i < ARRAY_LEN(tests); i++) { + int dfd, fd; + bool failed; + void (*resultfn)(const char *msg, ...) = ksft_test_result_pass; + + struct basic_test *test = &tests[i]; + char *flagstr = resolveat_flags(test->flags); + + if (test->dir) + dfd = openat(rootfd, test->dir, O_PATH | O_DIRECTORY); + else + dfd = dup(rootfd); + if (dfd < 0) { + resultfn = ksft_test_result_error; + goto next; + } + + fd = sys_resolveat(dfd, test->path, test->flags); + if (test->pass) + failed = (fd < 0 || !fdequal(fd, rootfd, test->out.path)); + else + failed = (fd != test->out.err); + if (fd >= 0) + close(fd); + close(dfd); + + if (failed) + resultfn = ksft_test_result_fail; + +next: + if (test->pass) + resultfn("resolveat(root[%s], %s, %s) ==> %s\n", + test->dir ?: ".", test->path, flagstr, + test->out.path ?: "."); + else + resultfn("resolveat(root[%s], %s, %s) ==> %d (%s)\n", + test->dir ?: ".", test->path, flagstr, + test->out.err, strerror(-test->out.err)); + free(flagstr); + } + + free(procselfexe); + close(rootfd); +} + + +static int proc_exec(int fd) +{ + int err, saved_errno; + char *procpath; + char *argv[] = {"foo", NULL}; + char *envp[] = {"bar", NULL}; + + E_asprintf(&procpath, "/proc/self/fd/%d", fd); + err = execve(procpath, argv, envp); + saved_errno = errno; + free(procpath); + + return err >= 0 ? err : -saved_errno; +} + +static int fd_exec(int fd) +{ + char *argv[] = {"foo", NULL}; + char *envp[] = {"bar", NULL}; + + return sys_execveat(fd, "", argv, envp, AT_EMPTY_PATH); +} + +int main(int argc, char **argv) +{ + ksft_print_header(); + test_resolveat_supported(); + + /* NOTE: We should be checking for CAP_SYS_ADMIN here... */ + if (geteuid() != 0) + ksft_exit_skip("resolveat(2) tests require euid == 0\n"); + + test_resolveat_basic_tests(); + + if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0) + ksft_exit_fail(); + else + ksft_exit_pass(); +}