From patchwork Thu Sep 7 20:24:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guilherme G. Piccoli" X-Patchwork-Id: 13376857 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A86DEC8758 for ; Thu, 7 Sep 2023 20:43:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A15A86B007D; Thu, 7 Sep 2023 16:43:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C5BB8D0003; Thu, 7 Sep 2023 16:43:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 83E658D0002; Thu, 7 Sep 2023 16:43:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6F0AD6B007D for ; Thu, 7 Sep 2023 16:43:40 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3C93240278 for ; Thu, 7 Sep 2023 20:43:40 +0000 (UTC) X-FDA: 81210977400.11.8369DCF Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by imf04.hostedemail.com (Postfix) with ESMTP id 34F5F4001E for ; Thu, 7 Sep 2023 20:43:37 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=ZaiAn2Gw; dmarc=none; spf=pass (imf04.hostedemail.com: domain of gpiccoli@igalia.com designates 213.97.179.56 as permitted sender) smtp.mailfrom=gpiccoli@igalia.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694119418; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1614qr0a3xvZg52JkdN3J4gqei97vnZdO0WxcgdVcIc=; b=2/QUPXB0wGJcv1ixgRUoSrdRP6eMqN1weUbNslVMaambOQYZuc5WEUedesMRGglsUu2IdI g8LzaCQVpjIo4lYOleppxXjWvyGZXcwy4/D3s6Sx/+rohOL2468quxi+EaftQvtXz1isSG Oh1Xkb/7eJ3pbZ59kV4jBPfuu5ui4/k= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=ZaiAn2Gw; dmarc=none; spf=pass (imf04.hostedemail.com: domain of gpiccoli@igalia.com designates 213.97.179.56 as permitted sender) smtp.mailfrom=gpiccoli@igalia.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694119418; a=rsa-sha256; cv=none; b=HdT9DKeA9JIUJ4K94bczI1zLAeLkgX6v8s1L+pdVyBbccwFO0b1r2YRo0i/9F2Q40jk6wv NKPrgIb9JQKeXjFy4tDiIj3Nvd1V0DfN5LLsfR94rk3MeoQMQC6U8GC66mUqdaKi2JaoX1 bjoAlTDQYqKjwIGgtRje+BldAuuskOQ= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=1614qr0a3xvZg52JkdN3J4gqei97vnZdO0WxcgdVcIc=; b=ZaiAn2GwvSjVTVg3aUFddeQAot pT5bNqn57LVLF5TGI0vaVdAGW2OXM2XdAkJLv6JL7wknhRg7BuOnJhaPtBp9d9IWwQ0jg83jbxI71 7DUhcvHY+iOII3BMuLTAiZ4SHwoH/p5QBjwNGwR5wTN3dNl2lv6Rxt6At6IKMjUE5CyBkWftDVzyU 9mWsgNOxmD934tsNGsCrmFTCq3ckGoiMF7PjJdaQcrfGipIh9wxMMBDNzv6CusNz4p7AgcxxmMhVY F69XCJVrIPrFIcQwiifB1FK1fdh6648rFQVA2TrwpfHM4CGSx/vkYzkIT99GD0hgeudhBX0KCuf3z DNaVrldA==; Received: from [179.232.147.2] (helo=localhost) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qeLqg-000goK-J3; Thu, 07 Sep 2023 22:43:34 +0200 From: "Guilherme G. Piccoli" To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org, kernel-dev@igalia.com, kernel@gpiccoli.net, keescook@chromium.org, ebiederm@xmission.com, oleg@redhat.com, yzaikin@google.com, mcgrof@kernel.org, akpm@linux-foundation.org, brauner@kernel.org, viro@zeniv.linux.org.uk, willy@infradead.org, david@redhat.com, dave@stgolabs.net, sonicadvance1@gmail.com, joshua@froggi.es, "Guilherme G. Piccoli" Subject: [RFC PATCH 1/2] binfmt_misc, fork, proc: Introduce flag to expose the interpreted binary in procfs Date: Thu, 7 Sep 2023 17:24:50 -0300 Message-ID: <20230907204256.3700336-2-gpiccoli@igalia.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20230907204256.3700336-1-gpiccoli@igalia.com> References: <20230907204256.3700336-1-gpiccoli@igalia.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 34F5F4001E X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: jxw7ein1bkcuixs47nuf96rfsw96fzht X-HE-Tag: 1694119417-170721 X-HE-Meta: U2FsdGVkX1/uWdl3VvbYQrusWlVKHkqlWYyg1EI5/QQjqC6dnSQQoZc9t0Zp2RSt0r7gPt0YRoYDJ0DatrjsbU4yHuNQqYeE1stOlVR1PJjP2JAPQ0K34AnggUxNRknED0c7EM294E1cX1zz1LOAw4bq4bkR6u3r+7R3LJFrUxBQ4Z2YBr1iH+8PlbC+i4zgYPJA3iK+KP4dcySemBuXTIA4XoP0g+cKO8NW/Av5/8Pyg4rDLDsReIZcyiLTzzB6aGyqoZg54oHi2b2ArF50h0AwxJc365QPsHC22wIRNb4WhfpA6pkXyEKx0oaCrp5EDjqU+9yal9ISmN9kH8AUMGcOURqgL8Au+I5WZtUm+DsdIZlryePmMMTmRufkiYBwoQuzFEqTgcRj1AdOR/YJHsiU3kBsWCuTS5mO3pDm2fq8pP0y0wpTgsfhm70+e2TKuIrzS9QkzvQBh1kkWO7dGzMkGLs0y3Bj51MZl/Le66dL+aPLVQ35HXt0PLzzvnzUqsCDEhrfGi5W03JM0gEdhyzBMZQ1NMtEbzP68QY15rA4u+vZxhwO6+JsyTUFaIcXA7LiPAzjd/U3oRcxht3nG5TerLRYW+8JbP+z0dDbKhyW/L7ueqNju/8sHuAedtTucXv+BJhG9bX8ZLKl9NJ4I5gD9u54Iq7oWk2EwwZ708Vs+0jL3TQMy8N0p2uuIFOaBSwwn+UPSRX7yeJq0njKwqQB1liEKu3J5uKIhU/4gZpwMwyOYFwwHXQacGvB0BOmcZGUFidoHxu4C43Qbu0IlQSbpRBUdTKJrQVtZKjtlIXNtlKZO/Wz634inv++EsAjQ50WL2S8nkwnPGxWHkBDas7tDbnASKfR2jPc7Xbn4FfCppHw56886L1VFOCZ16Ph8URzA7iQI61mX4Y8H3vqbzXX4bTXlb6iGDHTviRBxk8BdjtpxrTU7PECIUGbLgFNnBWivFxOsllCZVEHnw+ i8CPFZlu gkxOtgCfLIjyCoxmhoZEOBIfmEYkvsXk+N92I1NXdLhUeNVnGSlXQso/zLBVelTMr5UDZv9TfN2v5Emr19TNJHboOsCZgjXWehQ/0YzHc0ppegoG1Qb+k9aTEBPuqq09IZaVXAcisvFjUBbP/xvZVIZ/QXh9AmtMeAEv1E3Vxgcsjhohl/tYWHChX84vKCggBDm4dwgnF0POSe/6e2asVYBaq9O08l9+MtaApU8/qcJ5nZgACUR6ldvmASA6PSOTvTp+s9hPk8ZL078MqtNhKsSFphLlPvqkfrFEyJVGCstaCcdUppHEp+jShqrjWX1KNcwuv7qOz9pbRBzybbxy4m8mhRnpCqemsSUhOfL7xHhcdzpzSA/W9J28xNjKeKFcUMMHP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The procfs symlink /proc/self/exe_file points to the executable binary of the running process/thread. What happens though for binfmt_misc interpreted cases is that it effectively points to the *interpreter*, which is valid as the interpreter is in fact the one binary running. But there are cases in which this is considered a limitation - see for example the case of Linux architecture emulators, like FEX [0]. A binary running under such emulator could check its own symlink, and that'd be invalid, pointing instead to the emulator (its interpreter). This adds overhead to the emulation process, that must trap accesses to such symlink to guarantee it is exposed properly to the emulated binary. Add hereby the flag 'I' to binfmt_misc to allow override this default behavior of mapping the interpreter as /proc/self/exe - with this flag, the *interpreted* file is exposed in procfs instead. [0] https://github.com/FEX-Emu/FEX Suggested-by: Ryan Houdek Tested-by: Ryan Houdek Signed-off-by: Guilherme G. Piccoli --- Some design decisions / questions: (a) The patch makes use of interpreted_file == NULL to diverge if the flag is set or not. In other words: in any case but binfmt_misc with flag 'I' set, this pointer is NULL and the real exe_file is retrieved. This way, we don't need to propagate some flag from the ends of binfmt_misc up to procfs code. Does it make sense? (b) Of course there's various places affected by this change that I'm not sure if we should care or not, i.e., if we need to somehow change the behavior as well. Examples are audit code, tomoyo, etc. Even worse is the case of users setting the exe_file through prctl; what to do in these cases? I've marked them on code using comments starting with FIXME (BINPRM_FLAGS_EXPOSE_INTERP) so we can debate. (c) Keeping interpreted_file on mm_struct *seems* to make sense to me, though I'm not really sure of the impact of this new member there, for cache locality or anything else I'm not seeing. An alternative would be a small struct exec_files{} that contains both exe_file and interpreted_file, if that's somehow better... (d) Naming: both "interpreted_file" and the flag "I" were simple choices and subject to change to any community suggestion, I'm not attached to them in any way. Probably there's more implicit design decisions here and any feedback is greatly appreciated! Thanks in advance, Guilherme Documentation/admin-guide/binfmt-misc.rst | 11 +++ arch/arc/kernel/troubleshoot.c | 5 ++ fs/binfmt_elf.c | 7 ++ fs/binfmt_misc.c | 11 +++ fs/coredump.c | 5 ++ fs/exec.c | 18 +++- fs/proc/base.c | 2 +- include/linux/binfmts.h | 3 + include/linux/mm.h | 6 +- include/linux/mm_types.h | 1 + kernel/audit.c | 5 ++ kernel/audit_watch.c | 7 +- kernel/fork.c | 105 ++++++++++++++++------ kernel/signal.c | 7 +- kernel/sys.c | 5 ++ kernel/taskstats.c | 7 +- security/tomoyo/util.c | 5 ++ 17 files changed, 173 insertions(+), 37 deletions(-) diff --git a/Documentation/admin-guide/binfmt-misc.rst b/Documentation/admin-guide/binfmt-misc.rst index 59cd902e3549..175fca8439d6 100644 --- a/Documentation/admin-guide/binfmt-misc.rst +++ b/Documentation/admin-guide/binfmt-misc.rst @@ -88,6 +88,17 @@ Here is what the fields mean: emulation is installed and uses the opened image to spawn the emulator, meaning it is always available once installed, regardless of how the environment changes. + ``I`` - expose the interpreted file in the /proc/self/exe symlink + By default, binfmt_misc executing binaries expose their interpreter + as the /proc/self/exe file, which makes sense given that the actual + executable running is the interpreter indeed. But there are some + cases in which we want to change that behavior - imagine an emulator + of Linux binaries (of different architecture, for example) which + needs to deal with the different behaviors when running native - the + binary's symlink (/proc/self/exe) points to the binary itself - vs + the emulated case, whereas the link points to the interpreter. This + flag allows to change the default behavior and have the proc symlink + pointing to the **interpreted** file, not the interpreter. There are some restrictions: diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c index d5b3ed2c58f5..d078af66f07b 100644 --- a/arch/arc/kernel/troubleshoot.c +++ b/arch/arc/kernel/troubleshoot.c @@ -62,6 +62,11 @@ static void print_task_path_n_nm(struct task_struct *tsk) if (!mm) goto done; + /* + * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): observe that if using binfmt_misc + * with flag 'I' set, this functionality will diverge from the + * /proc/self/exe symlink with regards of what executable is running. + */ exe_file = get_mm_exe_file(mm); mmput(mm); diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 7b3d2d491407..fb0c22fa3635 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1162,6 +1162,13 @@ static int load_elf_binary(struct linux_binprm *bprm) } } + /* + * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): here is one of our problems - bprm->file is + * elf_mapped(), whereas our saved bprm->interpreted_file isn't. Now, why not just + * map it, right? Because we're not sure how to (or if it's indeed necessary). + * What if the interpreted file is not ELF? Could be anything that its interpreter + * is able to read and execute... + */ error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt, elf_prot, elf_flags, total_size); if (BAD_ADDR(error)) { diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index e0108d17b085..36350c3d73f5 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -48,6 +48,7 @@ enum {Enabled, Magic}; #define MISC_FMT_OPEN_BINARY (1UL << 30) #define MISC_FMT_CREDENTIALS (1UL << 29) #define MISC_FMT_OPEN_FILE (1UL << 28) +#define MISC_FMT_EXPOSE_INTERPRETED (1UL << 27) typedef struct { struct list_head list; @@ -181,6 +182,9 @@ static int load_misc_binary(struct linux_binprm *bprm) if (retval < 0) goto ret; + if (fmt->flags & MISC_FMT_EXPOSE_INTERPRETED) + bprm->interp_flags |= BINPRM_FLAGS_EXPOSE_INTERP; + if (fmt->flags & MISC_FMT_OPEN_FILE) { interp_file = file_clone_open(fmt->interp_file); if (!IS_ERR(interp_file)) @@ -258,6 +262,11 @@ static char *check_special_flags(char *sfs, Node *e) p++; e->flags |= MISC_FMT_OPEN_FILE; break; + case 'I': + pr_debug("register: flag: I: (expose interpreted binary)\n"); + p++; + e->flags |= MISC_FMT_EXPOSE_INTERPRETED; + break; default: cont = 0; } @@ -524,6 +533,8 @@ static void entry_status(Node *e, char *page) *dp++ = 'C'; if (e->flags & MISC_FMT_OPEN_FILE) *dp++ = 'F'; + if (e->flags & MISC_FMT_EXPOSE_INTERPRETED) + *dp++ = 'I'; *dp++ = '\n'; if (!test_bit(Magic, &e->flags)) { diff --git a/fs/coredump.c b/fs/coredump.c index 9d235fa14ab9..1a771c7cba67 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -164,6 +164,11 @@ static int cn_print_exe_file(struct core_name *cn, bool name_only) char *pathbuf, *path, *ptr; int ret; + /* + * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): observe that if using binfmt_misc + * with flag 'I' set, this coredump functionality will diverge from the + * /proc/self/exe symlink with regards of what executable is running. + */ exe_file = get_mm_exe_file(current->mm); if (!exe_file) return cn_esc_printf(cn, "%s (path unknown)", current->comm); diff --git a/fs/exec.c b/fs/exec.c index 6518e33ea813..bb1574f37b67 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1280,7 +1280,8 @@ int begin_new_exec(struct linux_binprm * bprm) * not visible until then. Doing it here also ensures * we don't race against replace_mm_exe_file(). */ - retval = set_mm_exe_file(bprm->mm, bprm->file); + retval = set_mm_exe_file(bprm->mm, bprm->file, + bprm->interpreted_file); if (retval) goto out; @@ -1405,6 +1406,13 @@ int begin_new_exec(struct linux_binprm * bprm) fd_install(retval, bprm->executable); bprm->executable = NULL; bprm->execfd = retval; + /* + * Since bprm->interpreted_file points to bprm->executable and + * fd_install() consumes its refcount, we need to bump the refcount + * here to avoid warnings as "file count is 0" on kernel log. + */ + if (unlikely(bprm->interp_flags & BINPRM_FLAGS_EXPOSE_INTERP)) + get_file(bprm->interpreted_file); } return 0; @@ -1500,6 +1508,8 @@ static void free_bprm(struct linux_binprm *bprm) allow_write_access(bprm->file); fput(bprm->file); } + if (bprm->interpreted_file) + fput(bprm->interpreted_file); if (bprm->executable) fput(bprm->executable); /* If a binfmt changed the interp, free it. */ @@ -1789,6 +1799,9 @@ static int exec_binprm(struct linux_binprm *bprm) bprm->interpreter = NULL; allow_write_access(exec); + if (unlikely(bprm->interp_flags & BINPRM_FLAGS_EXPOSE_INTERP)) + bprm->interpreted_file = exec; + if (unlikely(bprm->have_execfd)) { if (bprm->executable) { fput(exec); @@ -1796,7 +1809,8 @@ static int exec_binprm(struct linux_binprm *bprm) } bprm->executable = exec; } else - fput(exec); + if (!(bprm->interp_flags & BINPRM_FLAGS_EXPOSE_INTERP)) + fput(exec); } audit_bprm(bprm); diff --git a/fs/proc/base.c b/fs/proc/base.c index ffd54617c354..a13fbfc46997 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1727,7 +1727,7 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path) task = get_proc_task(d_inode(dentry)); if (!task) return -ENOENT; - exe_file = get_task_exe_file(task); + exe_file = get_task_exe_file(task, true); put_task_struct(task); if (exe_file) { *exe_path = exe_file->f_path; diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 8d51f69f9f5e..5dde52de7877 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -45,6 +45,7 @@ struct linux_binprm { point_of_no_return:1; struct file *executable; /* Executable to pass to the interpreter */ struct file *interpreter; + struct file *interpreted_file; /* only for binfmt_misc with flag I */ struct file *file; struct cred *cred; /* new credentials */ int unsafe; /* how unsafe this exec is (mask of LSM_UNSAFE_*) */ @@ -75,6 +76,8 @@ struct linux_binprm { #define BINPRM_FLAGS_PRESERVE_ARGV0_BIT 3 #define BINPRM_FLAGS_PRESERVE_ARGV0 (1 << BINPRM_FLAGS_PRESERVE_ARGV0_BIT) +#define BINPRM_FLAGS_EXPOSE_INTERP_BIT 4 +#define BINPRM_FLAGS_EXPOSE_INTERP (1 << BINPRM_FLAGS_EXPOSE_INTERP_BIT) /* * This structure defines the functions that are used to load the binary formats that * linux accepts. diff --git a/include/linux/mm.h b/include/linux/mm.h index bf5d0b1b16f4..a00b32906604 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3255,10 +3255,12 @@ static inline int check_data_rlimit(unsigned long rlim, extern int mm_take_all_locks(struct mm_struct *mm); extern void mm_drop_all_locks(struct mm_struct *mm); -extern int set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file); +extern int set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file, + struct file *new_interpreted_file); extern int replace_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file); extern struct file *get_mm_exe_file(struct mm_struct *mm); -extern struct file *get_task_exe_file(struct task_struct *task); +extern struct file *get_task_exe_file(struct task_struct *task, + bool prefer_interpreted); extern bool may_expand_vm(struct mm_struct *, vm_flags_t, unsigned long npages); extern void vm_stat_account(struct mm_struct *, vm_flags_t, long npages); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 36c5b43999e6..346f81875f3e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -842,6 +842,7 @@ struct mm_struct { /* store ref to file /proc//exe symlink points to */ struct file __rcu *exe_file; + struct file __rcu *interpreted_file; /* see binfmt_misc flag I */ #ifdef CONFIG_MMU_NOTIFIER struct mmu_notifier_subscriptions *notifier_subscriptions; #endif diff --git a/kernel/audit.c b/kernel/audit.c index 16205dd29843..83c64c376c0c 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -2197,6 +2197,11 @@ void audit_log_d_path_exe(struct audit_buffer *ab, if (!mm) goto out_null; + /* + * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): observe that if using binfmt_misc + * with flag 'I' set, this audit functionality will diverge from the + * /proc/self/exe symlink with regards of what executable is running. + */ exe_file = get_mm_exe_file(mm); if (!exe_file) goto out_null; diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c index 65075f1e4ac8..b8f947849fb2 100644 --- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -527,7 +527,12 @@ int audit_exe_compare(struct task_struct *tsk, struct audit_fsnotify_mark *mark) unsigned long ino; dev_t dev; - exe_file = get_task_exe_file(tsk); + /* + * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): if using the binfmt_misc flag 'I', we diverge + * here from proc_exe_link(), exposing the true exe_file (instead of the interpreted + * binary as proc). Should we expose here the same exe_file as proc's one *always*? + */ + exe_file = get_task_exe_file(tsk, false); if (!exe_file) return 0; ino = file_inode(exe_file)->i_ino; diff --git a/kernel/fork.c b/kernel/fork.c index 3b6d20dfb9a8..8c4824dcc433 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -628,12 +628,47 @@ void free_task(struct task_struct *tsk) } EXPORT_SYMBOL(free_task); -static void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm) +/** + * __get_mm_exe_or_interp_file - helper that acquires a reference to the mm's + * executable file, or if prefer_interp is set, go with mm->interpreted_file + * instead. + * + * Returns %NULL if mm has no associated executable/interpreted file. + * User must release file via fput(). + */ +static inline struct file *__get_mm_exe_or_interp_file(struct mm_struct *mm, + bool prefer_interp) { struct file *exe_file; + rcu_read_lock(); + + if (unlikely(prefer_interp)) + exe_file = rcu_dereference(mm->interpreted_file); + else + exe_file = rcu_dereference(mm->exe_file); + + if (exe_file && !get_file_rcu(exe_file)) + exe_file = NULL; + rcu_read_unlock(); + return exe_file; +} + +struct file *get_mm_exe_file(struct mm_struct *mm) +{ + return __get_mm_exe_or_interp_file(mm, false); +} + +static void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm) +{ + struct file *exe_file, *interp_file; + exe_file = get_mm_exe_file(oldmm); RCU_INIT_POINTER(mm->exe_file, exe_file); + + interp_file = __get_mm_exe_or_interp_file(oldmm, true); + RCU_INIT_POINTER(mm->interpreted_file, interp_file); + /* * We depend on the oldmm having properly denied write access to the * exe_file already. @@ -1279,6 +1314,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, mm_init_owner(mm, p); mm_pasid_init(mm); RCU_INIT_POINTER(mm->exe_file, NULL); + RCU_INIT_POINTER(mm->interpreted_file, NULL); mmu_notifier_subscriptions_init(mm); init_tlb_flush_pending(mm); #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS @@ -1348,7 +1384,7 @@ static inline void __mmput(struct mm_struct *mm) khugepaged_exit(mm); /* must run before exit_mmap */ exit_mmap(mm); mm_put_huge_zero_page(mm); - set_mm_exe_file(mm, NULL); + set_mm_exe_file(mm, NULL, NULL); if (!list_empty(&mm->mmlist)) { spin_lock(&mmlist_lock); list_del(&mm->mmlist); @@ -1394,7 +1430,9 @@ EXPORT_SYMBOL_GPL(mmput_async); /** * set_mm_exe_file - change a reference to the mm's executable file * - * This changes mm's executable file (shown as symlink /proc/[pid]/exe). + * This changes mm's executable file (shown as symlink /proc/[pid]/exe), + * and if new_interpreted_file != NULL, also sets this field (check the + * binfmt_misc documentation, flag 'I', for details about this). * * Main users are mmput() and sys_execve(). Callers prevent concurrent * invocations: in mmput() nobody alive left, in execve it happens before @@ -1402,16 +1440,19 @@ EXPORT_SYMBOL_GPL(mmput_async); * * Can only fail if new_exe_file != NULL. */ -int set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file) +int set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file, + struct file *new_interpreted_file) { struct file *old_exe_file; + struct file *old_interpreted_file; /* - * It is safe to dereference the exe_file without RCU as - * this function is only called if nobody else can access - * this mm -- see comment above for justification. + * It is safe to dereference exe_file / interpreted_file + * without RCU as this function is only called if nobody else + * can access this mm -- see comment above for justification. */ old_exe_file = rcu_dereference_raw(mm->exe_file); + old_interpreted_file = rcu_dereference_raw(mm->interpreted_file); if (new_exe_file) { /* @@ -1423,10 +1464,20 @@ int set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file) get_file(new_exe_file); } rcu_assign_pointer(mm->exe_file, new_exe_file); + + /* For this one we don't care about write access... */ + if (new_interpreted_file) + get_file(new_interpreted_file); + rcu_assign_pointer(mm->interpreted_file, new_interpreted_file); + if (old_exe_file) { allow_write_access(old_exe_file); fput(old_exe_file); } + + if (old_interpreted_file) + fput(old_interpreted_file); + return 0; } @@ -1436,6 +1487,12 @@ int set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file) * This changes mm's executable file (shown as symlink /proc/[pid]/exe). * * Main user is sys_prctl(PR_SET_MM_MAP/EXE_FILE). + * + * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): imagine user performs the sys_prctl() + * aiming to change /proc/self/exe symlink - suppose user is interested + * in the executable path itself. With binfmt_misc flag 'I', this change + * **won't reflect** since procfs make use of interpreted_file. What to do + * in this case? Do we care? */ int replace_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file) { @@ -1482,31 +1539,15 @@ int replace_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file) } /** - * get_mm_exe_file - acquire a reference to the mm's executable file - * - * Returns %NULL if mm has no associated executable file. - * User must release file via fput(). - */ -struct file *get_mm_exe_file(struct mm_struct *mm) -{ - struct file *exe_file; - - rcu_read_lock(); - exe_file = rcu_dereference(mm->exe_file); - if (exe_file && !get_file_rcu(exe_file)) - exe_file = NULL; - rcu_read_unlock(); - return exe_file; -} - -/** - * get_task_exe_file - acquire a reference to the task's executable file + * get_task_exe_file - acquire a reference to the task's executable or + * interpreted file (only for procfs, when under binfmt_misc with flag 'I'). * * Returns %NULL if task's mm (if any) has no associated executable file or * this is a kernel thread with borrowed mm (see the comment above get_task_mm). * User must release file via fput(). */ -struct file *get_task_exe_file(struct task_struct *task) +struct file *get_task_exe_file(struct task_struct *task, + bool prefer_interpreted) { struct file *exe_file = NULL; struct mm_struct *mm; @@ -1514,8 +1555,14 @@ struct file *get_task_exe_file(struct task_struct *task) task_lock(task); mm = task->mm; if (mm) { - if (!(task->flags & PF_KTHREAD)) - exe_file = get_mm_exe_file(mm); + if (!(task->flags & PF_KTHREAD)) { + if (unlikely(prefer_interpreted)) { + exe_file = __get_mm_exe_or_interp_file(mm, true); + if (!exe_file) + exe_file = get_mm_exe_file(mm); + } else + exe_file = get_mm_exe_file(mm); + } } task_unlock(task); return exe_file; diff --git a/kernel/signal.c b/kernel/signal.c index 09019017d669..3a8d85a65c49 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1263,7 +1263,12 @@ static void print_fatal_signal(int signr) struct pt_regs *regs = task_pt_regs(current); struct file *exe_file; - exe_file = get_task_exe_file(current); + /* + * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): if using the binfmt_misc flag 'I', we diverge + * here from proc_exe_link(), exposing the true exe_file (instead of the interpreted + * binary as proc). Should we expose here the same exe_file as proc's one *always*? + */ + exe_file = get_task_exe_file(current, false); if (exe_file) { pr_info("%pD: %s: potentially unexpected fatal signal %d.\n", exe_file, current->comm, signr); diff --git a/kernel/sys.c b/kernel/sys.c index 2410e3999ebe..17fab2f71443 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1912,6 +1912,11 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd) if (err) goto exit; + /* + * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): please read the comment + * on replace_mm_exe_file() to ponder about the divergence when + * using binfmt_misc with flag 'I'. + */ err = replace_mm_exe_file(mm, exe.file); exit: fdput(exe); diff --git a/kernel/taskstats.c b/kernel/taskstats.c index 8ce3fa0c19e2..a5d5afc1919a 100644 --- a/kernel/taskstats.c +++ b/kernel/taskstats.c @@ -157,7 +157,12 @@ static void send_cpu_listeners(struct sk_buff *skb, static void exe_add_tsk(struct taskstats *stats, struct task_struct *tsk) { /* No idea if I'm allowed to access that here, now. */ - struct file *exe_file = get_task_exe_file(tsk); + struct file *exe_file = get_task_exe_file(tsk, false); + /* + * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): if using the binfmt_misc flag 'I', we diverge + * here from proc_exe_link(), exposing the true exe_file (instead of the interpreted + * binary as proc). Should we expose here the same exe_file as proc's one *always*? + */ if (exe_file) { /* Following cp_new_stat64() in stat.c . */ diff --git a/security/tomoyo/util.c b/security/tomoyo/util.c index 6799b1122c9d..844bdbd27240 100644 --- a/security/tomoyo/util.c +++ b/security/tomoyo/util.c @@ -971,6 +971,11 @@ const char *tomoyo_get_exe(void) if (!mm) return NULL; + /* + * FIXME (BINPRM_FLAGS_EXPOSE_INTERP): observe that if using binfmt_misc + * with flag 'I' set, this tomoyo functionality will diverge from the + * /proc/self/exe symlink with regards of what executable is running. + */ exe_file = get_mm_exe_file(mm); if (!exe_file) return NULL; From patchwork Thu Sep 7 20:24:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guilherme G. Piccoli" X-Patchwork-Id: 13376858 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA36EEC8759 for ; Thu, 7 Sep 2023 20:44:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 24DF56B007B; Thu, 7 Sep 2023 16:44:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1FE166B007E; Thu, 7 Sep 2023 16:44:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 001608D0002; Thu, 7 Sep 2023 16:43:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D53BE6B007B for ; Thu, 7 Sep 2023 16:43:59 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id AB886B2999 for ; Thu, 7 Sep 2023 20:43:59 +0000 (UTC) X-FDA: 81210978198.14.87264D4 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by imf20.hostedemail.com (Postfix) with ESMTP id D13391C0030 for ; Thu, 7 Sep 2023 20:43:57 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=LHcbIien; dmarc=none; spf=pass (imf20.hostedemail.com: domain of gpiccoli@igalia.com designates 213.97.179.56 as permitted sender) smtp.mailfrom=gpiccoli@igalia.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694119438; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=na/YxXTVI1qQn9ck4pZ96HjTGea7ATsA0kF9+ebpm6c=; b=nlyL81aZCZYB/+3Scu+BRndWWgMJ/umM/C7wvDR76+njxauqfTwWM3HREsMcANZVDkuxKp xLpMDWG6xLqioETwbWRCx5Xeum6k2SJA6hViAygI5FDxYCsKQWIUN92u0VUabA0wEXWKA6 0epL6OAyyYF0I8WmZQJ/w0fkR4t6qE8= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=LHcbIien; dmarc=none; spf=pass (imf20.hostedemail.com: domain of gpiccoli@igalia.com designates 213.97.179.56 as permitted sender) smtp.mailfrom=gpiccoli@igalia.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694119438; a=rsa-sha256; cv=none; b=V3nGE1hlnVK41o9qOxK1eSM+ThMBVuy09SBkrbHdJtgIi7DyNqYhElwL8GjBdDv3Z1s8yp +HNufq3PyF/QkV9Jra6dQkperMumNwIhnJeYgv+4AAVBbATfa0Nv+kYkrgvXOshOLdPrya IGwUsmVKPiNVcgCAjJKXoMAQyuYWr4I= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=na/YxXTVI1qQn9ck4pZ96HjTGea7ATsA0kF9+ebpm6c=; b=LHcbIienhvTZv+5F/Vf5xwc6pC spUQLd/GIbdkYAu6DvoObPDj07HEk2p560gaB5knXz4Zgd6BORvNQckIWcQUIdJOOKZVDh9qm83mM qeHz7Iaro6BYMkTckJ/QwofhTDC7JCnkFDHx1MwigfWPuu+HSEN5udGsvdAph8iDjgNikWxOJRi3m wZVLIu46o4AgsjypAcWyGcMeOsVh908anGDWbl9uMlIqz7eLNyl3WJ59j1y955y9dhLD8DB/cl8wE +5UEMGIp6mWVT4AqKRFjtbRIESeMxYJb3d9tBRv+wzL13iszbcT8X46cF8AeG/HSndOkddqGsZodt RROgYafQ==; Received: from [179.232.147.2] (helo=localhost) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qeLr0-000goj-4i; Thu, 07 Sep 2023 22:43:54 +0200 From: "Guilherme G. Piccoli" To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org, kernel-dev@igalia.com, kernel@gpiccoli.net, keescook@chromium.org, ebiederm@xmission.com, oleg@redhat.com, yzaikin@google.com, mcgrof@kernel.org, akpm@linux-foundation.org, brauner@kernel.org, viro@zeniv.linux.org.uk, willy@infradead.org, david@redhat.com, dave@stgolabs.net, sonicadvance1@gmail.com, joshua@froggi.es, "Guilherme G. Piccoli" Subject: [RFC PATCH 2/2] fork, procfs: Introduce /proc/self/interpreter symlink Date: Thu, 7 Sep 2023 17:24:51 -0300 Message-ID: <20230907204256.3700336-3-gpiccoli@igalia.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20230907204256.3700336-1-gpiccoli@igalia.com> References: <20230907204256.3700336-1-gpiccoli@igalia.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: 4m671arpb9f74tbqufdnybkw3k9dungw X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: D13391C0030 X-HE-Tag: 1694119437-366738 X-HE-Meta: U2FsdGVkX19c60+iTkxQVMBxT8aEuu4if4fX/+TWmGNra/dhlToZzIWOIn3HJpnvRwzCXNBEau7E6weB2gCFYWpMigkD+wn4HYwWMieNE1dfqLqoNo1O8k9QD80kAcUHX/ZL7cODmOWda1HyeA8k7ouz/cK+rqpt9wU9Ob8UWfExdhmZnc+oVEVPaynqhDIEkB5sWoTzH9vzAvtTVWUeNE505U3O2x5wYnD3JLupWZ6pfu63QFkj99QUkm2qzGbJY6uVBb1H1TmhryN6CL2w7fn1qKP+5w4RvTlGs2jC6mJLkyWjYxBUvgeDJCAy3XQZzU+CKaxH418ql4L6mNMik0ZtWkU4NzbArSXRnvYNzafjaWs3dGgErhJ53dZ7q/tLXaa6SF4k7SnnF8UBbfS4w9IIb39T/UCqtj3VUj7KaV83M5NV+VwYvW6RpiE/I4V+euOAXUg/WADQ719V4gQevhl7IkEhqeA41Uhf8ntfaFVrmruFvBeP5T8RvSLwHvI7X4PFGCBbsRtW63059bSlTgIU7EXp0rft7b1wWQFpHqg5GO13zJpf9znQsCKGnN0+JyG0AteH++l24P15sZKRESp+6NvxbYd4keWUphADzfKJW8FD4CgD88AMPF9kxyFHjOIgmKm4mZCtbUtgZy3cazUNu74VneHEXqx37R/7srQDv24W9l9O/qi7Yn2ZpEUtePTrX3xWt7Ovw+nXDZyU25yrqkPSa6e9MR4jAyqRdi4NXDil8nTFXwXLPHTzDKDT+jbQswpqRSC70sxfzBs6uv5dV3kIcifSou4kiYxCVkxiNKRlkmAV+rumUdCHrOzQKfgiqpbvJcdlpYd9SZ7TfyJR3SWQl9YIv0N/frOVH0DEauqB3MnMlrQrU+NAMSx68HNtqwRux9/VTTw6SlNSjbZ6fq7SKGQRbsJ8aiMp1prsQqs6FWngIBT+VFmcCj8JfaSYdMkn+ppr1LvA5CU X6h9mfeb Sbla1XJHdA1bUmYebs0Aya6dIgD2Ozk8I/tZS0/S+juLTRKJJMrbnqBER+k5ePweQikJYBw4TMYCfDz7dkhK+yHydUST1kUiYVSMWrSHJo88P2lgfse0D7gldoO2EO3+R7E9qf8M4aKcVfJkVEq/Ro4Evkik2MWR+hDs2hF+9hVNenAqZRvHF21NwDnN2M0R4GWSnORtRsoAxPLvF7QSu4szN9+wuixe8FTJ/ahekczKTXDlEPdWnOExfF/qBkKWq3eJy806u5t70Duxmu7AsDvoJxxLZHgNcOtKf+zoztYr9+mwDXmKeeCvCfYn7hM5IDGxk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently we have the /proc/self/exe_file symlink, that points to the executable binary of the running process - or to the *interpreted* file if flag 'I' is set on binfmt_misc. In this second case, we then lose the ability of having a symlink to the interpreter. Introduce hereby the /proc/self/interpreter symlink which always points to the *interpreter* when we're under interpretation, like on binfmt_misc use. We don't require to have a new file pointer since mm_struct contains exe_file, which points to such interpreter. Suggested-by: Ryan Houdek Tested-by: Ryan Houdek Signed-off-by: Guilherme G. Piccoli --- Design choices / questions: (a) The file /proc/self/interpreter is always present, and in most cases, points to nothing (since we're running an ELF binary, with no interpreter!). Is it OK? The implementation follows the pattern of /proc/self/exe_file, returns -ENOENT when we have no interpreter. I'm not sure if there's a way to implement this as a procfs file that is not always visible, i.e., it would only show up if such interpreter exists. If that would be possible, is it better than the current implementation though? Also, we could somehow make /proc/self/interpreter points to the LD preloader for ELFs, if that's considered a better approach. (b) I like xmacros to avoid repeating code, but I'm totally fine changing that as well as naming of stuff. (c) Should we extend functionality present on exe_file to the interpreter, like prctl replacing mechanism, audit info collection, etc? Any feedback here is greatly appreciated - thanks in advance! Cheers, Guilherme fs/exec.c | 8 +++++++ fs/proc/base.c | 48 ++++++++++++++++++++++++++-------------- include/linux/binfmts.h | 1 + include/linux/mm.h | 1 + include/linux/mm_types.h | 1 + kernel/fork.c | 26 ++++++++++++++++++++++ 6 files changed, 69 insertions(+), 16 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index bb1574f37b67..39f9c86d5ebc 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1285,6 +1285,13 @@ int begin_new_exec(struct linux_binprm * bprm) if (retval) goto out; + /* + * In case we're in an interpreted scenario (like binfmt_misc, + * for example), flag it on mm_struct in order to expose such + * interpreter as the symlink /proc/self/interpreter. + */ + bprm->mm->has_interpreter = bprm->has_interpreter; + /* If the binary is not readable then enforce mm->dumpable=0 */ would_dump(bprm, bprm->file); if (bprm->have_execfd) @@ -1796,6 +1803,7 @@ static int exec_binprm(struct linux_binprm *bprm) exec = bprm->file; bprm->file = bprm->interpreter; + bprm->has_interpreter = true; bprm->interpreter = NULL; allow_write_access(exec); diff --git a/fs/proc/base.c b/fs/proc/base.c index a13fbfc46997..ecd5ea05acb0 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1719,25 +1719,39 @@ static const struct file_operations proc_pid_set_comm_operations = { .release = single_release, }; -static int proc_exe_link(struct dentry *dentry, struct path *exe_path) +static struct file *proc_get_task_exe_file(struct task_struct *task) { - struct task_struct *task; - struct file *exe_file; - - task = get_proc_task(d_inode(dentry)); - if (!task) - return -ENOENT; - exe_file = get_task_exe_file(task, true); - put_task_struct(task); - if (exe_file) { - *exe_path = exe_file->f_path; - path_get(&exe_file->f_path); - fput(exe_file); - return 0; - } else - return -ENOENT; + return get_task_exe_file(task, true); } +static struct file *proc_get_task_interpreter_file(struct task_struct *task) +{ + return get_task_interpreter_file(task); +} + +/* Definition of proc_exe_link and proc_interpreter_link. */ +#define PROC_GET_LINK_FUNC(type) \ +static int proc_##type##_link(struct dentry *dentry, struct path *path) \ +{ \ + struct task_struct *task; \ + struct file *file; \ + \ + task = get_proc_task(d_inode(dentry)); \ + if (!task) \ + return -ENOENT; \ + file = proc_get_task_##type##_file(task); \ + put_task_struct(task); \ + if (file) { \ + *path = file->f_path; \ + path_get(&file->f_path); \ + fput(file); \ + return 0; \ + } else \ + return -ENOENT; \ +} +PROC_GET_LINK_FUNC(exe); +PROC_GET_LINK_FUNC(interpreter); + static const char *proc_pid_get_link(struct dentry *dentry, struct inode *inode, struct delayed_call *done) @@ -3276,6 +3290,7 @@ static const struct pid_entry tgid_base_stuff[] = { LNK("cwd", proc_cwd_link), LNK("root", proc_root_link), LNK("exe", proc_exe_link), + LNK("interpreter", proc_interpreter_link), REG("mounts", S_IRUGO, proc_mounts_operations), REG("mountinfo", S_IRUGO, proc_mountinfo_operations), REG("mountstats", S_IRUSR, proc_mountstats_operations), @@ -3626,6 +3641,7 @@ static const struct pid_entry tid_base_stuff[] = { LNK("cwd", proc_cwd_link), LNK("root", proc_root_link), LNK("exe", proc_exe_link), + LNK("interpreter", proc_interpreter_link), REG("mounts", S_IRUGO, proc_mounts_operations), REG("mountinfo", S_IRUGO, proc_mountinfo_operations), #ifdef CONFIG_PROC_PAGE_MONITOR diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 5dde52de7877..2362c6bc6ead 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -46,6 +46,7 @@ struct linux_binprm { struct file *executable; /* Executable to pass to the interpreter */ struct file *interpreter; struct file *interpreted_file; /* only for binfmt_misc with flag I */ + bool has_interpreter; /* In order to expose /proc/self/interpreter */ struct file *file; struct cred *cred; /* new credentials */ int unsafe; /* how unsafe this exec is (mask of LSM_UNSAFE_*) */ diff --git a/include/linux/mm.h b/include/linux/mm.h index a00b32906604..e06b703db494 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3261,6 +3261,7 @@ extern int replace_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file); extern struct file *get_mm_exe_file(struct mm_struct *mm); extern struct file *get_task_exe_file(struct task_struct *task, bool prefer_interpreted); +extern struct file *get_task_interpreter_file(struct task_struct *task); extern bool may_expand_vm(struct mm_struct *, vm_flags_t, unsigned long npages); extern void vm_stat_account(struct mm_struct *, vm_flags_t, long npages); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 346f81875f3e..19a73c41991c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -843,6 +843,7 @@ struct mm_struct { /* store ref to file /proc//exe symlink points to */ struct file __rcu *exe_file; struct file __rcu *interpreted_file; /* see binfmt_misc flag I */ + bool has_interpreter; /* exposes (or not) /proc/self/interpreter */ #ifdef CONFIG_MMU_NOTIFIER struct mmu_notifier_subscriptions *notifier_subscriptions; #endif diff --git a/kernel/fork.c b/kernel/fork.c index 8c4824dcc433..5cb542f92d5e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1568,6 +1568,32 @@ struct file *get_task_exe_file(struct task_struct *task, return exe_file; } +/** + * get_task_interpreter_file - acquire a reference to the task's *interpreter* + * executable (which is in fact the exe_file on mm_struct!). This is used in + * order to expose /proc/self/interpreter, if we're under an interpreted + * scenario (like binfmt_misc). + * + * Returns %NULL if exe_file is not an interpreter (i.e., it is the truly + * running binary indeed). + */ +struct file *get_task_interpreter_file(struct task_struct *task) +{ + struct file *interpreter_file = NULL; + struct mm_struct *mm; + + task_lock(task); + + mm = task->mm; + if (mm && mm->has_interpreter) { + if (!(task->flags & PF_KTHREAD)) + interpreter_file = get_mm_exe_file(mm); + } + + task_unlock(task); + return interpreter_file; +} + /** * get_task_mm - acquire a reference to the task's mm *