From patchwork Thu Jan 17 00:32:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 10767293 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE454186E for ; Thu, 17 Jan 2019 00:34:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 37AD12DDDC for ; Thu, 17 Jan 2019 00:34:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2B4872E2B7; Thu, 17 Jan 2019 00:34:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0D03B2DDDC for ; Thu, 17 Jan 2019 00:34:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 59B868E0010; Wed, 16 Jan 2019 19:33:42 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 37DBA8E0014; Wed, 16 Jan 2019 19:33:42 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1AA658E0010; Wed, 16 Jan 2019 19:33:42 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 88F8D8E000F for ; Wed, 16 Jan 2019 19:33:41 -0500 (EST) Received: by mail-pg1-f200.google.com with SMTP id i124so5030652pgc.2 for ; Wed, 16 Jan 2019 16:33:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=xW6CRRJwi1KYibaizGiM8dXCI+UcQIffrtO8Ze96gxM=; b=A+HgKHv7YDlPr3LAlPcyzaI9otG/btGhH7yt0wPgQgf70pZNjxi1Xam1eAzYGZIq9B T5hq69RbPaiCHtQNGz80JoPFAs/Im3pvXK7bNT74MBEAkKztcGELD7KW0Fy0wkFuVFZ/ 80IaEHFe3BOATL+1bWHBodvKuOKjLaLRjxbnojkBEeTAve8z1DNnLJQm4bkCgncaeBuw TH8V1U5SKhhX7WWOjDghoSjKe5BJbckQW5MVbYA7dBU4ihB8BrWz3f7oKVfaYKOVKdhA SaXoyhtmpB5ULSTBiFUAGbf46FGn5tW58kAecE7ciUMlv0hZij+h0+AnwF8I4ihFyd0P 2rOg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukeD74P5bE5WTERtrYhDQD0pu9UurPSokSgURRbwBbTYuyyl9kVm /G64kIoVWIlN0sOd6RiW7C5Q1/Yt/4hI1PAU/27FMyXTu7jgggujdNRtqvhOXfDT3oCXDMX2VrT wRUNmUGoCe+3bzGLZ/G17vPVYX60ywOexjxWTEVZEZWEPb4x3VfKOQp1skff9CJN3Ug== X-Received: by 2002:a63:8ac4:: with SMTP id y187mr11552978pgd.446.1547685221134; Wed, 16 Jan 2019 16:33:41 -0800 (PST) X-Google-Smtp-Source: ALg8bN4zGzDWvK82v6tlCeDtDOHy+Lm93Nli8j3sTAUQTxDtNKofmEeeqFM9cQ8XvtpwQ4UtZaft X-Received: by 2002:a63:8ac4:: with SMTP id y187mr11552901pgd.446.1547685220064; Wed, 16 Jan 2019 16:33:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547685220; cv=none; d=google.com; s=arc-20160816; b=o8bp1QD92mCPa0+q/PCMTHz7V4Be6M7Ugp3eyRRut/Smej1pKQ8tovibSE2ItyowHI KkIZluJ5GxzVcGtrRFsK+4MGiEWAZ6NVX+NMJZbd7DbLSEE3AJ1aw1o+qKdg5MWH6wl2 66r8aee5sXmaqj4zlm9XSM2g7R9Psz5BtS82rs0uIhaX2bWwrnaLLTKqI7pyS/dQa8ke ZZGq1un1KHqfld6NBEgUQL6FzgcdecnOPVW6bt6Bwh/I2mZtZlmpBxjXTRsGr90Ej7en y+DtQhGZs4OhUVyj1DrxYo6N8D8n/J/Twf3vWwrda0fMYXph9heMlN8o5Qcl+O/oUDRd Gu/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=xW6CRRJwi1KYibaizGiM8dXCI+UcQIffrtO8Ze96gxM=; b=u+R0X+2AroLnWQhhglTQL0lgFNqyzoGQHQ7PeLJJ/2LzW5Y39sI3CtqYhB2KMHPU6i nvGPC49GehzrfdBZwpVkyc7RuQFRahGwUC6h5IRCrdIuR7vkt/seUr50n35eZahVabSg DtFicE+nFehZugQoonyqgyv3JGqONMqJv3frr3URJ9daQcWX7+5aMEP5TsvzzdeJx8nn NEomDiWDz28lcwCmxFgMydLkdWooEuEDyvTuMPDx2MsKFbQ+AOPdFvYAtLSvZVQxeE2m jj3k/eJFmwRjL58CTOSR17G606ogRRf9HXRcN3bMj1oynQrZxYLbXfeFwI1BpDHXcyql p/wg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id 1si7848435plo.195.2019.01.16.16.33.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Jan 2019 16:33:40 -0800 (PST) Received-SPF: pass (google.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.120 as permitted sender) client-ip=192.55.52.120; Authentication-Results: mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 16:33:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,488,1539673200"; d="scan'208";a="292166060" Received: from rpedgeco-desk5.jf.intel.com ([10.54.75.79]) by orsmga005.jf.intel.com with ESMTP; 16 Jan 2019 16:33:36 -0800 From: Rick Edgecombe To: Andy Lutomirski , Ingo Molnar Cc: linux-kernel@vger.kernel.org, x86@kernel.org, hpa@zytor.com, Thomas Gleixner , Borislav Petkov , Nadav Amit , Dave Hansen , Peter Zijlstra , linux_dti@icloud.com, linux-integrity@vger.kernel.org, linux-security-module@vger.kernel.org, akpm@linux-foundation.org, kernel-hardening@lists.openwall.com, linux-mm@kvack.org, will.deacon@arm.com, ard.biesheuvel@linaro.org, kristen@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe , Rusty Russell , Masami Hiramatsu , Daniel Borkmann , Alexei Starovoitov , Jessica Yu , Steven Rostedt , "Paul E . McKenney" Subject: [PATCH 16/17] Plug in new special vfree flag Date: Wed, 16 Jan 2019 16:32:58 -0800 Message-Id: <20190117003259.23141-17-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190117003259.23141-1-rick.p.edgecombe@intel.com> References: <20190117003259.23141-1-rick.p.edgecombe@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Add new flag for handling freeing of special permissioned memory in vmalloc and remove places where memory was set RW before freeing which is no longer needed. In kprobes, bpf and ftrace this just adds the flag, and removes the now unneeded set_memory_ calls before calling vfree. In modules, the freeing of init sections is moved to a work queue, since freeing of RO memory is not supported in an interrupt by vmalloc. Instead of call_rcu, it now uses synchronize_rcu() in the work queue. Cc: Rusty Russell Cc: Masami Hiramatsu Cc: Daniel Borkmann Cc: Alexei Starovoitov Cc: Jessica Yu Cc: Steven Rostedt Cc: Paul E. McKenney Signed-off-by: Rick Edgecombe Acked-by: Steven Rostedt (VMware) --- arch/x86/kernel/ftrace.c | 6 +-- arch/x86/kernel/kprobes/core.c | 7 +--- include/linux/filter.h | 16 ++----- kernel/bpf/core.c | 1 - kernel/module.c | 77 +++++++++++++++++----------------- 5 files changed, 45 insertions(+), 62 deletions(-) diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c index eb4a1937e72c..47597e028346 100644 --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -692,10 +692,6 @@ static inline void *alloc_tramp(unsigned long size) } static inline void tramp_free(void *tramp, int size) { - int npages = PAGE_ALIGN(size) >> PAGE_SHIFT; - - set_memory_nx((unsigned long)tramp, npages); - set_memory_rw((unsigned long)tramp, npages); module_memfree(tramp); } #else @@ -820,6 +816,8 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) /* ALLOC_TRAMP flags lets us know we created it */ ops->flags |= FTRACE_OPS_FL_ALLOC_TRAMP; + set_vm_special(trampoline); + /* * Module allocation needs to be completed by making the page * executable. The page is still writable, which is a security hazard, diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index fac692e36833..f2fab35bcb82 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -434,6 +434,7 @@ void *alloc_insn_page(void) if (page == NULL) return NULL; + set_vm_special(page); /* * First make the page read-only, and then only then make it executable * to prevent it from being W+X in between. @@ -452,12 +453,6 @@ void *alloc_insn_page(void) /* Recover page to RW mode before releasing it */ void free_insn_page(void *page) { - /* - * First make the page non-executable, and then only then make it - * writable to prevent it from being W+X in between. - */ - set_memory_nx((unsigned long)page, 1); - set_memory_rw((unsigned long)page, 1); module_memfree(page); } diff --git a/include/linux/filter.h b/include/linux/filter.h index f18cd317faf8..0abe812e7b75 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -20,6 +20,7 @@ #include #include #include +#include #include @@ -483,7 +484,6 @@ struct bpf_prog { u16 pages; /* Number of allocated pages */ u16 jited:1, /* Is our filter JIT'ed? */ jit_requested:1,/* archs need to JIT the prog */ - undo_set_mem:1, /* Passed set_memory_ro() checkpoint */ gpl_compatible:1, /* Is filter GPL compatible? */ cb_access:1, /* Is control block accessed? */ dst_needed:1, /* Do we need dst entry? */ @@ -681,26 +681,17 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default) static inline void bpf_prog_lock_ro(struct bpf_prog *fp) { + set_vm_special(fp); set_memory_ro((unsigned long)fp, fp->pages); } -static inline void bpf_prog_unlock_ro(struct bpf_prog *fp) -{ - if (fp->undo_set_mem) - set_memory_rw((unsigned long)fp, fp->pages); -} - static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr) { + set_vm_special(hdr); set_memory_ro((unsigned long)hdr, hdr->pages); set_memory_x((unsigned long)hdr, hdr->pages); } -static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr) -{ - set_memory_rw((unsigned long)hdr, hdr->pages); -} - static inline struct bpf_binary_header * bpf_jit_binary_hdr(const struct bpf_prog *fp) { @@ -735,7 +726,6 @@ void __bpf_prog_free(struct bpf_prog *fp); static inline void bpf_prog_unlock_free(struct bpf_prog *fp) { - bpf_prog_unlock_ro(fp); __bpf_prog_free(fp); } diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index f908b9356025..a1a4d6f4253c 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -804,7 +804,6 @@ void __weak bpf_jit_free(struct bpf_prog *fp) if (fp->jited) { struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp); - bpf_jit_binary_unlock_ro(hdr); bpf_jit_binary_free(hdr); WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp)); diff --git a/kernel/module.c b/kernel/module.c index ae1b77da6a20..1af5c8e19086 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -98,6 +98,10 @@ DEFINE_MUTEX(module_mutex); EXPORT_SYMBOL_GPL(module_mutex); static LIST_HEAD(modules); +/* Work queue for freeing init sections in success case */ +static struct work_struct init_free_wq; +static struct llist_head init_free_list; + #ifdef CONFIG_MODULES_TREE_LOOKUP /* @@ -1949,6 +1953,8 @@ void module_enable_ro(const struct module *mod, bool after_init) if (!rodata_enabled) return; + set_vm_special(mod->core_layout.base); + set_vm_special(mod->init_layout.base); frob_text(&mod->core_layout, set_memory_ro); frob_text(&mod->core_layout, set_memory_x); @@ -1972,15 +1978,6 @@ static void module_enable_nx(const struct module *mod) frob_writable_data(&mod->init_layout, set_memory_nx); } -static void module_disable_nx(const struct module *mod) -{ - frob_rodata(&mod->core_layout, set_memory_x); - frob_ro_after_init(&mod->core_layout, set_memory_x); - frob_writable_data(&mod->core_layout, set_memory_x); - frob_rodata(&mod->init_layout, set_memory_x); - frob_writable_data(&mod->init_layout, set_memory_x); -} - /* Iterate through all modules and set each module's text as RW */ void set_all_modules_text_rw(void) { @@ -2024,23 +2021,8 @@ void set_all_modules_text_ro(void) } mutex_unlock(&module_mutex); } - -static void disable_ro_nx(const struct module_layout *layout) -{ - if (rodata_enabled) { - frob_text(layout, set_memory_rw); - frob_rodata(layout, set_memory_rw); - frob_ro_after_init(layout, set_memory_rw); - } - frob_rodata(layout, set_memory_x); - frob_ro_after_init(layout, set_memory_x); - frob_writable_data(layout, set_memory_x); -} - #else -static void disable_ro_nx(const struct module_layout *layout) { } static void module_enable_nx(const struct module *mod) { } -static void module_disable_nx(const struct module *mod) { } #endif #ifdef CONFIG_LIVEPATCH @@ -2120,6 +2102,11 @@ static void free_module_elf(struct module *mod) void __weak module_memfree(void *module_region) { + /* + * This memory may be RO, and freeing RO memory in an interrupt is not + * supported by vmalloc. + */ + WARN_ON(in_interrupt()); vfree(module_region); } @@ -2171,7 +2158,6 @@ static void free_module(struct module *mod) mutex_unlock(&module_mutex); /* This may be empty, but that's OK */ - disable_ro_nx(&mod->init_layout); module_arch_freeing_init(mod); module_memfree(mod->init_layout.base); kfree(mod->args); @@ -2181,7 +2167,6 @@ static void free_module(struct module *mod) lockdep_free_key_range(mod->core_layout.base, mod->core_layout.size); /* Finally, free the core (containing the module structure) */ - disable_ro_nx(&mod->core_layout); module_memfree(mod->core_layout.base); } @@ -3424,17 +3409,34 @@ static void do_mod_ctors(struct module *mod) /* For freeing module_init on success, in case kallsyms traversing */ struct mod_initfree { - struct rcu_head rcu; + struct llist_node node; void *module_init; }; -static void do_free_init(struct rcu_head *head) +static void do_free_init(struct work_struct *w) { - struct mod_initfree *m = container_of(head, struct mod_initfree, rcu); - module_memfree(m->module_init); - kfree(m); + struct llist_node *pos, *n, *list; + struct mod_initfree *initfree; + + list = llist_del_all(&init_free_list); + + synchronize_rcu(); + + llist_for_each_safe(pos, n, list) { + initfree = container_of(pos, struct mod_initfree, node); + module_memfree(initfree->module_init); + kfree(initfree); + } } +static int __init modules_wq_init(void) +{ + INIT_WORK(&init_free_wq, do_free_init); + init_llist_head(&init_free_list); + return 0; +} +module_init(modules_wq_init); + /* * This is where the real work happens. * @@ -3511,7 +3513,6 @@ static noinline int do_init_module(struct module *mod) #endif module_enable_ro(mod, true); mod_tree_remove_init(mod); - disable_ro_nx(&mod->init_layout); module_arch_freeing_init(mod); mod->init_layout.base = NULL; mod->init_layout.size = 0; @@ -3522,14 +3523,18 @@ static noinline int do_init_module(struct module *mod) * We want to free module_init, but be aware that kallsyms may be * walking this with preempt disabled. In all the failure paths, we * call synchronize_rcu(), but we don't want to slow down the success - * path, so use actual RCU here. + * path. We can't do module_memfree in an interrupt, so we do the work + * and call synchronize_rcu() in a work queue. + * * Note that module_alloc() on most architectures creates W+X page * mappings which won't be cleaned up until do_free_init() runs. Any * code such as mark_rodata_ro() which depends on those mappings to * be cleaned up needs to sync with the queued work - ie * rcu_barrier() */ - call_rcu(&freeinit->rcu, do_free_init); + if (llist_add(&freeinit->node, &init_free_list)) + schedule_work(&init_free_wq); + mutex_unlock(&module_mutex); wake_up_all(&module_wq); @@ -3826,10 +3831,6 @@ static int load_module(struct load_info *info, const char __user *uargs, module_bug_cleanup(mod); mutex_unlock(&module_mutex); - /* we can't deallocate the module until we clear memory protection */ - module_disable_ro(mod); - module_disable_nx(mod); - ddebug_cleanup: ftrace_release_mod(mod); dynamic_debug_remove(mod, info->debug);